Otedama: Fast Rule-Based Pre-Ordering for Machine Translation

Julian Hitschler,Laura Jehl,Sariya Karimova,Mayumi Ohta,Benjamin Körner,S. Riezler

Published 2016 in Prague Bulletin of Mathematical Linguistics

ABSTRACT

Abstract We present Otedama, a fast, open-source tool for rule-based syntactic pre-ordering, a well established technique in statistical machine translation. Otedama implements both a learner for pre-ordering rules, as well as a component for applying these rules to parsed sentences. Our system is compatible with several external parsers and capable of accommodating many source and all target languages in any machine translation paradigm which uses parallel training data. We demonstrate improvements on a patent translation task over a state-of-the-art English-Japanese hierarchical phrase-based machine translation system. We compare Otedama with an existing syntax-based pre-ordering system, showing comparable translation performance at a runtime speedup of a factor of 4.5-10.

PUBLICATION RECORD

Publication year
2016
Venue
Prague Bulletin of Mathematical Linguistics
Publication date
2016-10-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.1515/pralin-2016-0015
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Collaborative Development of a Rule-Based Machine Translator between Croatian and Serbian
2016cited by this paper
MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
2016cited by this paper
Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation
2016cited by this paper
Lexicographic Tools to Build New Encyclopaedia of the Czech Language
2016cited by this paper
Interoperability in MT Quality Estimation or wrapping useful stuff in various ways
2016cited by this paper
The ADAPT Bilingual Document Alignment system at WMT16
2016cited by this paper
Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval
2016cited by this paper
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
2016cited by this paper
Qualitative: Python Tool for MT Quality Estimation Supporting Server Mode and Hybrid MT
2016cited by this paper
Efficient Word Alignment with Markov Chain Monte Carlo
2016cited by this paper
FaDA: Fast Document Aligner using Word Embedding
2016cited by this paper
Language Adaptation for Extending Post-Editing Estimates for Closely Related Languages
2016cited by this paper
DFKI’s system for WMT16 IT-domain task, including analysis of systematic errors
2016cited by this paper
A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora
2015cited by this paper
Improving Evaluation of Machine Translation Quality Estimation
2015cited by this paper
Fast and Accurate Preordering for SMT using Neural Networks
2015cited by this paper
Multi-level Translation Quality Prediction with QuEst++
2015cited by this paper
On an Apparent Freedom of Czech Word Order . A Case Study
2015cited by this paper
Large Scale Translation Quality Estimation
2015cited by this paper
DFKI’s experimental hybrid MT system for WMT 2015
2015cited by this paper
Management and Publishing of Multimedia Dictionary of the Czech Sign Language
2015cited by this paper
An open-source toolkit for word-level confidence estimation in machine translation
2015cited by this paper
Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
2015cited by this paper
Reconstructions of Deletions in a Dependency-based Description of Czech: Selected Issues
2015cited by this paper
Efficient Top-Down BTG Parsing for Machine Translation Preordering
2015cited by this paper
Creation and Management of Structured Language Resources
2015cited by this paper
Bayesian Models for Multilingual Word Alignment
2015cited by this paper
Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques
2015cited by this paper
Findings of the 2015 Workshop on Statistical Machine Translation
2015cited by this paper
Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora
2014cited by this paper
Correlating decoding events with errors in Statistical Machine Translation
2014cited by this paper
Statistical Machine Translation
2014cited by this paper
Qualitative: Open Source Python Tool for Quality Estimation over Multiple Machine Translation Outputs
2014cited by this paper
Source-side Preordering for Translation using Logistic Regression and Depth-first Branch-and-Bound Search
2014cited by this paper
Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements
2014cited by this paper
The Role of Grammatical Constraints in Lexical Component in Functional Generative Description
2014cited by this paper
Genres in the Prague Discourse Treebank
2014cited by this paper
Verbs of Saying with a Textual Connecting Function in the Prague Discourse Treebank
2014cited by this paper
QuEst - A translation quality estimation framework
2013cited by this paper
Parsing with Compositional Vector Grammars
2013cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service
2013cited by this paper
Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling
2013cited by this paper
Source-Side Classifier Preordering for Machine Translation
2013cited by this paper
A Systematic Bayesian Treatment of the IBM Alignment Models
2013cited by this paper
A Survey on Multi-view Learning
2013cited by this paper
Scalable Modified Kneser-Ney Language Model Estimation
2013cited by this paper
A Simple, Fast, and Effective Reparameterization of IBM Model 2
2013cited by this paper
How Dependency Trees and Tectogrammatics Help Annotating Coreference and Bridging Relations in Prague Dependency Treebank
2013cited by this paper
Exact Maximum Inference for the Fertility Hidden Markov Model
2013cited by this paper
Findings of the 2012 Workshop on Statistical Machine Translation
2012cited by this paper
Valence sloves v Pražském závislostním korpusu
2012cited by this paper
Alternative Lexicalizations of Discourse Connectives in Czech
2012cited by this paper
Quality estimation for Machine Translation output using linguistic analysis and decoding features
2012cited by this paper
Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus
2012cited by this paper
Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs
2012cited by this paper
Aspects of discourse structure
2012cited by this paper
On scalarity in information structure
2012cited by this paper
Sentence Modality Assignment in the Prague Dependency Treebank
2012cited by this paper
An open-source toolkit for integrating shallow-transfer rules into phrase-based statistical machine translation
2012cited by this paper
Theoretical challenges in the transition from lexicographical p-works to e-tools
2012cited by this paper
Semi-Automatic Annotation of Intra-Sentential Discourse Relations in PDT
2012cited by this paper
Perspectives on crowdsourcing annotations for natural language processing
2012cited by this paper
Inducing a Discriminative Parser to Optimize Machine Translation Reordering
2012influential reference
Data access revisited: The Interactive Language Toolbox
2012cited by this paper
Machine Translation Infrastructure and Post-editing Performance at Autodesk
2012cited by this paper
Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
2012cited by this paper
Does Tectogrammatics Help the Annotation of Discourse?
2012cited by this paper
Improving the IBM Alignment Models Using Variational Bayes
2012cited by this paper
Random Search for Hyper-Parameter Optimization
2012cited by this paper
Bayesian Word Alignment for Statistical Machine Translation
2011cited by this paper
A Gold Standard for English-Swedish Word Alignment
2011cited by this paper
Scikit-learn: Machine Learning in Python
2011cited by this paper
Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output
2011cited by this paper
A Python Experiment Suite
2011cited by this paper
Unsupervised Word Alignment with Arbitrary Features
2011cited by this paper
Bitext Alignment
2011cited by this paper
Apertium: a free/open-source platform for rule-based machine translation
2011cited by this paper
KenLM: Faster and Smaller Language Model Queries
2011cited by this paper
Invented antonyms: Esperanto as a semantic lab
2010cited by this paper
Enriching Data-Oriented Parsing by blending morphology and syntax
2010cited by this paper
cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models
2010cited by this paper
Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation
2010influential reference
Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank
2010cited by this paper
The Chain of Being and Having in Slavic
2010cited by this paper
Connective-Based Measuring of the Inter-Annotator Agreement in the Annotation of Discourse in PDT
2010cited by this paper
A Survey on Transfer Learning
2010cited by this paper
A Fast Fertility Hidden Markov Model for Word Alignment Using MCMC
2010cited by this paper
Rozšířená textová koreference a asociační anafora (koncepce anotace českých dat v pražském závislostním korpusu)
2010cited by this paper
Machine translation evaluation versus quality estimation
2010cited by this paper
A Note on the Implementation of Hierarchical Dirichlet Processes
2009cited by this paper
System for Querying Syntactically Annotated Corpora
2009cited by this paper
Transfer rule generation for a Japanese-Hungarian machine translation system
2009cited by this paper
Computational Experience with a Software Framework for Parallel Integer Programming
2009cited by this paper
A New Hybrid Dependency Parser for German
2009influential reference
Further Meta-Evaluation of Machine Translation
2008cited by this paper
The Penn Discourse TreeBank 2.0.
2008cited by this paper
Moses: Open Source Toolkit for Statistical Machine Translation
2007cited by this paper
(Meta-) Evaluation of Machine Translation
2007cited by this paper
Learning to rank: from pairwise approach to listwise approach
2007cited by this paper

CITED BY

Select and Reorder: A Novel Approach for Neural Sign Language Production
2024cites this paper
Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering
2023cites this paper
Source side pre-ordering using recurrent neural networks for English-Myanmar machine translation
2021cites this paper
Exploiting Dependency-based Pre-ordering for English-Myanmar Statistical Machine Translation
2019influential citation
Exploiting Pre-Ordering for Neural Machine Translation
2018influential citation