Stabilizing Minimum Error Rate Training

Published 2009 in WMT@EACL

ABSTRACT

The most commonly used method for training feature weights in statistical machine translation (SMT) systems is Och's minimum error rate training (MERT) procedure. A well-known problem with Och's procedure is that it tends to be sensitive to small changes in the system, particularly when the number of features is large. In this paper, we quantify the stability of Och's procedure by supplying different random seeds to a core component of the procedure (Powell's algorithm). We show that for systems with many features, there is extensive variation in outcomes, both on the development data and on the test data. We analyze the causes of this variation and propose modifications to the MERT procedure that improve stability while helping performance on test data.

PUBLICATION RECORD

Publication year
2009
Venue
WMT@EACL
Publication date
2009-03-30
Fields of study
Computer Science
Identifiers
DOI 10.3115/1626431.1626478
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Lattice-based Minimum Error Rate Training for Statistical Machine Translation
2008cited by this paper
Regularization and Search for Minimum Error Rate Training
2008cited by this paper
Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking
2008influential reference
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation
2008cited by this paper
A Systematic Comparison of Training Criteria for Statistical Machine Translation
2007cited by this paper
Forest Rescoring: Faster Decoding with Integrated Language Models
2007cited by this paper
A Hierarchical Phrase-Based Model for Statistical Machine Translation
2005cited by this paper
Minimum error training of log-linear translation models
2004cited by this paper
Improvements in Phrase-Based Statistical Machine Translation
2004cited by this paper
Statistical Phrase-Based Translation
2003cited by this paper
Minimum Error Rate Training in Statistical Machine Translation
2003influential reference
Numerical recipes in C
2002cited by this paper

CITED BY

MIT Open Access Articles Automatic Fact Checking Using Context and Discourse Information
2022cites this paper
Automatic Fact-Checking Using Context and Discourse Information
2019cites this paper
It Takes Nine to Smell a Rat: Neural Multi-Task Learning for Check-Worthiness Prediction
2019cites this paper
Preference learning for machine translation
2018cites this paper
- 1101 Data Selection using Topic Adaptation for Statistical Machine Translation
2018cites this paper
Domain adaptation using neural network joint model
2017cites this paper
Optimization for Statistical Machine Translation: A Survey
2016cites this paper
A Deep Fusion Model for Domain Adaptation in Phrase-based MT
2016cites this paper
Bi-text Alignment of Movie Subtitles for Spoken English-Arabic Statistical Machine Translation
2016cites this paper
Refinements in hierarchical phrase-based translation systems
2015cites this paper
Data Selection using Topic Adaptation for Statistical Machine Translation
2015cites this paper
Drem: The AFRL Submission to the WMT15 Tuning Task
2015influential citation
How to Avoid Unwanted Pregnancies: Domain Adaptation using Neural Network Models
2015cites this paper
Using joint models or domain adaptation in statistical machine translation
2015cites this paper
Pseudo-lemmatization in Croatian-English SMT
2014cites this paper
Domain adaptation for translation models in statistical machine translation
2013cites this paper
Amélioration des systèmes de traduction par analyse linguistique et thématique : application à la traduction depuis l'arabe. (Improvements for Machine Translation Systems Using Linguistic and Thematic Analysis : an Application to the Translation from Arabic)
2013cites this paper
Parameter Optimization for Statistical Machine Translation: It Pays to Learn from Hard Examples
2013cites this paper
Fast Large-Margin Learning for Statistical Machine Translation
2013cites this paper
A Tale about PRO and Monsters
2013cites this paper
depuis l'arabe
2013cites this paper
Online Relative Margin Maximization for Statistical Machine Translation
2013cites this paper
Two Approaches to Correcting Homophone Confusions in a Hybrid Machine Translation System
2013cites this paper
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013cites this paper
Improved Online Learning and Modeling for Feature-Rich Discriminative Machine Translation
2013cites this paper
Automatic Tune Set Generation for Machine Translation with Limited Indomain Data
2012cites this paper
Prediction of Learning Curves in Machine Translation
2012cites this paper
Structured Ramp Loss Minimization for Machine Translation
2012cites this paper
Optimization Strategies for Online Large-Margin Learning in Machine Translation
2012cites this paper
Optimizing for Sentence-Level BLEU+1 Yields Short Translations
2012cites this paper
Automatic Translation of Scientific Documents in the HAL Archive
2012cites this paper
Discriminative Feature-Rich Modeling for Syntax-Based Machine Translation
2012cites this paper
Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability
2011cites this paper
Probabilistic inference for phrase-based machine translation : a sampling approach
2011cites this paper
Maximum Rank Correlation Training for Statistical Machine Translation
2011cites this paper
Minimum Error Rate Training Semiring
2011cites this paper
The CMU-ARK German-English Translation System
2011influential citation
Amir Kamran Hybrid Machine Translation Approaches for Low-Resource Languages
2011cites this paper
Distributed Minimum Error Rate Training of SMT using Particle Swarm Optimization
2011cites this paper
SampleRank Training for Phrase-Based Machine Translation
2011cites this paper
Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation
2010cites this paper
Robust Estimation of Feature Weights in Statistical Machine Translation
2010cites this paper
An Empirical Study on Development Set Selection Strategy for Machine Translation Learning
2010cites this paper
Tuning Methods in Statistical Machine Translation
2010cites this paper
The INESC-ID machine translation system for the IWSLT 2010
2010cites this paper
Lessons from NRC’s Portage System at WMT 2010
2010cites this paper
Syntactically Enriched Statistical Machine Translation from English to German
2009cites this paper
PORTAGE in the NIST 2009 MT Evaluation
2009cites this paper