Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases

Chris Callison-Burch,C. Bannard,Josh Schroeder

Published 2005 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

In this paper we describe a novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. We detail the computational complexity and average retrieval times for looking up phrase translations in our suffix array-based data structure. We show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality.

PUBLICATION RECORD

Publication year
2005
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2005-06-25
Fields of study
Computer Science
Identifiers
DOI 10.3115/1219840.1219872
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Alignment Template Approach to Statistical Machine Translation
2004cited by this paper
The Mathematics of Machine Translation : Parameter
2004influential reference
Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models
2004cited by this paper
The CMU statistical machine translation system
2003cited by this paper
A Projection Extension Algorithm for Statistical Machine Translation
2003cited by this paper
Effective Phrase Translation Extraction from Alignment Models
2003cited by this paper
Statistical Phrase-Based Translation
2003influential reference
A Phrase-Based,Joint Probability Model for Statistical Machine Translation
2002cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002influential reference
Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus
2001cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
Suffix arrays: a new method for on-line string searches
1993cited by this paper

CITED BY

A New Approach to Quality Assessment of Chinese-English Neural Machine Translation
2023cites this paper
Character Mapping and Ad-hoc Adaptation: Edinburgh’s IWSLT 2020 Open Domain Translation System
2020cites this paper
Improved feature decay algorithms for statistical machine translation
2020cites this paper
Search Engine Guided Neural Machine Translation
2018cites this paper
Paraphrases as Foreign Languages in Multilingual Neural Machine Translation
2018cites this paper
Machine-Translation History and Evolution: Survey for Arabic-English Translations
2017cites this paper
Malaysian to German sign language statistical machine translation using Markov chain and search algorithms
2017cites this paper
Scalable Machine Translation in Memory Constrained Environments
2016cites this paper
Language model adaptation for ASR of spoken translations using phrase-based translation models and named entity models
2016cites this paper
Improving Statistical Machine Translation with Target-Side Dependency Syntax
2016cites this paper
Practical compressed string dictionaries
2016cites this paper
Translation of Unseen Bigrams by Analogy Using an SVM Classifier
2015cites this paper
Sampling Phrase Tables for the Moses Statistical Machine Translation System
2015cites this paper
KyotoEBMT System Description for the 2nd Workshop on Asian Translation
2015cites this paper
Improving Bilingual Search Performance Using Compact Full-Text Indices
2015cites this paper
Refinements in hierarchical phrase-based translation systems
2015cites this paper
Joshua 6: A phrase-based and hierarchical statistical machine translation system
2015cites this paper
Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars
2015influential citation
The Geometry of Statistical Machine Translation
2015cites this paper
Leveraging Online User Feedback to Improve Statistical Machine Translation
2015cites this paper
An English-Assamese Machine Translation System
2014cites this paper
KyotoEBMT: An Example-Based Dependency-to-Dependency Translation Framework
2014cites this paper
(Much) Faster Construction of SMT Phrase Tables from Large-scale Parallel Corpora (Construction (très) rapide de tables de traduction à partir de grands bi-textes) [in French]
2014cites this paper
Comparison binary search and linear algorithm for German-Indonesian sign language using Markov model
2014cites this paper
Dynamic phrase tables for machine translation in an interactive post-editing scenario
2014influential citation
Generalized Biwords for Bitext Compression and Translation Spotting
2014cites this paper
A Fast and Simple Online Synchronous Context Free Grammar Extractor
2014influential citation
LIMSI @ WMT’14 Medical Translation Task
2014cites this paper
Towards a More Efficient Development of Statistical Machine Translation Systems (Vers un développement plus efficace des systèmes de traduction statistique : un peu de vert dans un monde de BLEU) [in French]
2014cites this paper
Submodularity for Data Selection in Machine Translation
2014cites this paper
The 11th Conference of the Association for Machine Translation in the Americas Workshop on Interactive and Adaptive Machine Translation
2014influential citation
KyotoEBMT System Description for the 1st Workshop on Asian Translation
2014cites this paper
Incremental development of statistical machine translation systems
2014cites this paper
Undirected Machine Translation with Discriminative Reinforcement Learning
2014cites this paper
Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
2013cites this paper
An Investigation of the Sampling-Based Alignment Method and Its Contributions
2013cites this paper
Online Learning Approaches in Computer Assisted Translation
2013cites this paper
Machine Translation for Human Translators
2013cites this paper
Engineering Machine Translation for Deployment on Cloud
2013cites this paper
Hierarchical Phrase-Based Statistical Machine Translation System
2013influential citation
Chapter 1 Hierarchical phrase based Machine Translation : Literature survey
2013cites this paper
Distributional phrasal paraphrase generation for statistical machine translation
2013cites this paper
Simple and Efficient Model Filtering in Statistical Machine Translation
2012cites this paper
Towards contextual adaptation for any-text translation
2012influential citation
Rening Translation Grammars through Paraphrase Clustering Msc Thesis (afstudeerscriptie) Written By
2012cites this paper
A Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression
2012cites this paper
Selecting Data for English-to-Czech Machine Translation
2012cites this paper
A joint translation model with integrated reordering
2012cites this paper
Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression
2012cites this paper
Modeling Relevance in Statistical Machine Translation: Scoring Alignment, Context, and Annotations of Translation Instances
2012influential citation
Syntax-aware Phrase-based Statistical Machine Translation: System Description
2012cites this paper
Mitigating the Problems of SMT using EBMT
2012cites this paper
LIMSI @ WMT12
2012influential citation
Platform for Online Sharing of Training Data and Building
2011cites this paper
Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
2011influential citation
Cunei: open-source machine translation with relevance-based models of each translation instance
2011cites this paper
Extracting Transfer Rules for Multiword Expressions from Parallel Corpora
2011cites this paper
Stream-based statistical machine translation
2011cites this paper
Coping with data-sparsity in example-based machine translation
2011cites this paper
Hierarchical Phrase-Based Grammar Extraction in Joshua:
2010cites this paper
NUMBER 93 JANUARY 2010 157 – 166 Hierarchical Phrase-Based Grammar Extraction in Joshua Suffix Arrays and Prefix Trees
2010cites this paper
Stream-based Translation Models for Statistical Machine Translation
2010cites this paper
Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
2010influential citation
Accurate Non-Hierarchical Phrase-Based Translation
2010cites this paper
Fast Approximate String Matching with Suffix Arrays and A* Parsing
2010cites this paper
A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context
2010cites this paper
Machine Translation 1.1 Machine Translation Today
2010cites this paper
Moving Beyond Phrase Pairs: The Relevance of the Corpus in a SMT World
2010influential citation
Estruturas de Dados para Representação de um Léxico Bilingue
2010cites this paper
Rich Features in Phrase-Based Machine Translation
2010influential citation
Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation
2009cites this paper
Joshua: An Open Source Toolkit for Parsing-Based Machine Translation
2009cites this paper
Cunei Machine Translation Platform : System Description
2009cites this paper
A Data Structure for Sponsored Search
2009cites this paper
Decoding in Joshua: Open Source, Parsing-Based Machine Translation
2009cites this paper
Tera-Scale Translation Models via Pattern Matching
2008influential citation
The scaling problem in the pattern recognition approach to machine translation
2008cites this paper
Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce
2008cites this paper
Developing Deployable Spoken Language Translation Systems given Limited Resources
2008cites this paper
Machine Translation by Pattern Matching
2008influential citation
8 Statistical Machine Translation
2008cites this paper
Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation
2007influential citation
Sub-phrasal matching and structural templates in example-based MT
2007cites this paper
Translation Model Pruning via Usage Statistics for Statistical Machine Translation
2007cites this paper
A General Framework to Deal with the Scaling Problem in Phrase-Based Statistical Machine Translation
2007influential citation
A block bigram prediction model for statistical machine translation
2007cites this paper
Hierarchical Phrase-Based Translation with Suffix Arrays
2007influential citation
Tree-Structured Models of Multitext: Theory, Design and Experiments
2007cites this paper
Statistical machine translation
2007cites this paper
PanDoRA: a large-scale two-way statistical machine translation system for hand-held devices
2007cites this paper
Paraphrasing and translation
2007cites this paper
Extracting phrasal alignments from comparable corpora by using joint probability SMT model
2007cites this paper
Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation
2006cites this paper
Translation of Multiword Expressions Using Parallel Suffix Arrays
2006cites this paper
Acquiring phrasal lexicons from corpora
2006cites this paper
Empirical Lower Bounds on the Complexity of Translational Equivalence
2006cites this paper
Improved Statistical Machine Translation Using Paraphrases
2006cites this paper
Sub-Sentential Alignment Using Substring Co-Occurrence Counts
2006cites this paper
Low Cost Portability for Statistical Machine Translation based on N-gram Coverage
2005cites this paper
Ngram-based versus Phrase-based Statistical Machine Translation
2005cites this paper