Efficient Handling of N-gram Language Models for Statistical Machine Translation

Published 2007 in WMT@ACL

ABSTRACT

Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which have been tested within the Moses decoder, an open source toolkit for statistical machine translation. Experiments are reported with a high performing baseline, trained on the Chinese-English NIST 2006 Evaluation task and running on a standard Linux 64-bit PC architecture. Comparative tests show that our representation halves the memory required by SRI LM Toolkit, at the cost of 44% slower translation speed. However, as it can take advantage of memory mapping on disk, the proposed implementation seems to scale-up much better to very large language models: decoding with a 289-million 5-gram language model runs in 2.1Gb of RAM.

PUBLICATION RECORD

Publication year
2007
Venue
WMT@ACL
Publication date
2007-06-23
Fields of study
Computer Science
Identifiers
DOI 10.3115/1626355.1626367
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?
2006cited by this paper
Web-based models for natural language processing
2005cited by this paper
SRILM - an extensible language modeling toolkit
2002cited by this paper
Statistical language modeling using the CMU-cambridge toolkit
1997cited by this paper
Implementation Of Word Based Statistical Language Models
1997cited by this paper
An Empirical Study of Smoothing Techniques for Language Modeling
1996cited by this paper
Improved backing-off for M-gram language modeling
1995cited by this paper
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
1991cited by this paper
Computer Speech and Language
1986cited by this paper

CITED BY

Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data
2020cites this paper
High Order N-gram Model Construction and Application Based on Natural Annotation
2019cites this paper
A Semi-automatic Structure Learning Method for Language Modeling
2019cites this paper
English-Bhojpuri SMT System: Insights from the Karaka Model
2019cites this paper
English-Bhojpuri SMT System: Insights from the Karaka Model
2019influential citation
Système de traduction automatique statistique Anglais-Arabe
2018cites this paper
Workload prediction for runtime resource management
2017cites this paper
Transliteration of Secured SMS to Indian Regional Language
2016cites this paper
Statistical sequence and parsing models for descriptive linguistics and psycholinguistics
2016cites this paper
An improved Levenshtein algorithm for spelling correction word candidate list generation
2016cites this paper
CloudLM: a Cloud-based Language Model for Machine Translation
2016cites this paper
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
2016cites this paper
Automatic Translation from German to Synthesized Swiss German Sign Language
2016cites this paper
N-gram Based Text Categorization Method for Improved Data Mining
2015cites this paper
REDEFINICIÓN DE LA COMUNICACIÓN POLÍTICA: REDES SOCIALES Y PARTICIPACIÓN CIUDADANA
2015cites this paper
Geez To Amharic Automatic Machine Translation: A Statistical Approach
2015cites this paper
Large-scale reordering models for statistical machine translation
2015cites this paper
Pivot translation using source-side dictionary and target-side parallel corpus towards MT from resource-limited languages
2014cites this paper
GENERADOR DE CÓDIGO AUTOMÁTICO (GCA) EN LENGUAJE JAVA A PARTIR DE CON CONJUNTO DE INSTRUCCIONES EN LENGUAJE NATURAL
2014cites this paper
Sentential Paraphrase Generation for Agglutinative Languages Using SVM with a String Kernel
2014cites this paper
Statistical sentiment analysis performance in Opinum
2013cites this paper
Corrección no Supervisada de Dependencias Sintácticas de Aposición mediante Clases Semánticas
2013cites this paper
Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries
2013cites this paper
Combining Statistical Machine Translation and Translation Memories with Domain Adaptation
2013influential citation
Bagging and Boosting statistical machine translation systems
2013cites this paper
Amélioration des systèmes de traduction par analyse linguistique et thématique : application à la traduction depuis l'arabe. (Improvements for Machine Translation Systems Using Linguistic and Thematic Analysis : an Application to the Translation from Arabic)
2013cites this paper
Bologna Translation Service: improving access to educational courses via machine translation (system demonstration)
2013cites this paper
Lyrics Generation Using N-Gram Technology
2013cites this paper
MACHINE TRANSLATION USING MUTIPLEXED PDT FOR CHATTING SLANG
2013cites this paper
depuis l'arabe
2013cites this paper
Bi-Gram based Probabilistic Language Model for Template Messaging
2013cites this paper
UPM system for WMT 2012
2012cites this paper
A fast and flexible architecture for very large word n-gram datasets
2012influential citation
Semantics-based Question Generation and Implementation
2012cites this paper
Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation
2012cites this paper
CCG Syntactic Reordering Models for Phrase-based Machine Translation
2012cites this paper
SmartMATE: An Online End-To-End MT Post-Editing Framework
2012cites this paper
NAIST at the HOO 2012 Shared Task
2012cites this paper
Opinum: statistical sentiment analysis for opinion classification
2012cites this paper
Probabilistic N-gram language model for SMS Lingo
2012cites this paper
Bean soup translation: flexible, linguistically-motivated syntax for machine translation
2012cites this paper
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension
2012cites this paper
Simple and Knowledge-intensive Generative Model for Named Entity Recognition
2012cites this paper
Compact WFSA Based Language Model and Its Application in Statistical Machine Translation
2012cites this paper
Spurious Ambiguity and PMT : Turning “ Phrases ” into Phrases
2012cites this paper
Semantics-based Question Generation and Implementation
2012cites this paper
Translating User-Generated Content in the Social Networking Space
2012cites this paper
Detecting Acronyms from Capital Letter Sequences in Spanish
2012cites this paper
Probabilistic language model for template messaging based on Bi-gram
2012cites this paper
The methods of word sense disambiguation
2011cites this paper
Faster and Smaller N-Gram Language Models
2011cites this paper
Advances in fully-automatic and interactive phrase-based statistical machine translation
2011cites this paper
Incorporating Translation Quality-Oriented Features into Log-Linear Models of Machine Translation
2011cites this paper
A Dependency Based Statistical Translation Model
2011cites this paper
Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
2011cites this paper
Technical report: OpenMaTrEx, a free, open-source hybrid data-driven machine translation system
2011cites this paper
OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System
2010cites this paper
Minimal Perfect Hash Rank: Compact Storage of Large N-gram Language Models
2010cites this paper
Text Editor based on Google Trigram and its Usability
2010cites this paper
A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context
2010cites this paper
Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval
2010influential citation
Continuous-Space Language Models for Statistical Machine Translation
2010cites this paper
Customized Tries for Weighted Key Completion
2010cites this paper
APPROACHES TO HANDLE SCARCE RESOURCES FOR BENGALI STATISTICAL MACHINE TRANSLATION
2010cites this paper
Farsi - German statistical machine translation through bridge language
2010cites this paper
A study to find influential parameters on a Farsi-English statistical machine translation system
2010influential citation
Question Generation with Minimal Recursion Semantics
2010cites this paper
Efficient Minimal Perfect Hash Language Models
2010cites this paper
Domain Adaptation in Statistical Machine Translation
2009cites this paper
Software Engineering, Testing, and Quality Assurance for Natural Language Processing (setqa-nlp 2009) Using Paraphrases of Deep Semantic Representions to Support Regression Testing in Spoken Dialogue Systems Integrated Nlp Evaluation System for Pluggable Evaluation Metrics with Extensive Interoperab
2009cites this paper
Language Modeling for limited-data domains
2009cites this paper
Syntactically Enriched Statistical Machine Translation from English to German
2009cites this paper
Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too
2009influential citation
Experiments in Morphosyntactic Processing for Translating to and from German
2009influential citation
A Succinct N-gram Language Model
2009cites this paper
System Combination for Machine Translation of Spoken and Written Language
2008cites this paper
Domain specific MT in use
2008cites this paper
Efficient Speech Translation Through Confusion Network Decoding
2008cites this paper
A tutorial on the IRSTLM library
2008cites this paper
Exploiting Linguistic Data in Machine Translation
2008cites this paper
Iterative language model estimation: efficient data structure & algorithms
2008influential citation
Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation
2008cites this paper
IRSTLM: an open source toolkit for handling large scale language models
2008cites this paper
Investigating automatic assessment of reading comprehension in young children
2008cites this paper
The IRST English-Spanish translation system for european parliament speeches
2007influential citation
FBK@IWSLT 2007
2007influential citation
3 . 2 : Mathematical Model of Tree Transformations
2007cites this paper
D 6 . 1 : Improved confidence estimation and hybrid architectures for machine translation
2007cites this paper
Continuous Space Language Models for Statistical Machine Translation
2006cites this paper