Effective Selection of Translation Model Training Data

Le Liu,Yu Hong,Hao Liu,Xing Wang,Jianmin Yao

Published 2014 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Data selection has been demonstrated to be an effective approach to addressing the lack of high-quality bitext for statistical machine translation in the domain of interest. Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus. By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model. In this paper, we study and experiment with novel methods that apply translation models into domain-relevant data selection. The results show that our methods outperform previous methods. When the selected sentence pairs are evaluated on an end-to-end MT task, our methods can increase the translation performance by 3 BLEU points. *

PUBLICATION RECORD

Publication year
2014
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2014-06-01
Fields of study
Computer Science
Identifiers
DOI 10.3115/v1/P14-2093
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
2013cited by this paper
Towards Effective Use of Training Data in Statistical Machine Translation
2012cited by this paper
Proceedings of the Seventh Workshop on Statistical Machine Translation
2012cited by this paper
Analysing the Effect of Out-of-Domain Data on SMT Systems
2012cited by this paper
NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
2012cited by this paper
Domain Adaptation via Pseudo In-Domain Data Selection
2011cited by this paper
Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation
2010cited by this paper
Analysis of translation model adaptation in statistical machine translation
2010cited by this paper
Intelligent Selection of Language Model Training Data
2010cited by this paper
Method of Selecting Training Data to Build a Compact and Efficient Translation Model
2008cited by this paper
A Hierarchical Phrase-Based Model for Statistical Machine Translation
2005cited by this paper
Minimum Error Rate Training in Statistical Machine Translation
2003cited by this paper
A Systematic Comparison of Various Statistical Alignment Models
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
SRILM - an extensible language modeling toolkit
2002cited by this paper
An Empirical Study of Smoothing Techniques for Language Modeling
1996cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper

CITED BY

Machine Translation Customization via Automatic Training Data Selection from the Web
2021cites this paper
Parallel feature weight decay algorithms for fast development of machine translation models
2021cites this paper
Tencent AI Lab Machine Translation Systems for the WMT20 Biomedical Translation Task
2020cites this paper
Domain Divergences: A Survey and Empirical Analysis
2020cites this paper
- 1101 Data Selection using Topic Adaptation for Statistical Machine Translation
2018cites this paper
Domain adaptation using neural network joint model
2017cites this paper
Bilingual recursive neural network based data selection for statistical machine translation
2016cites this paper
Research on Chinese negation and speculation: corpus annotation and identification
2016cites this paper
Topic Model Based Adaptation Data Selection for Domain-Specific Machine Translation
2016cites this paper
A Deep Fusion Model for Domain Adaptation in Phrase-based MT
2016cites this paper
Data Selection using Topic Adaptation for Statistical Machine Translation
2015cites this paper
Using joint models or domain adaptation in statistical machine translation
2015cites this paper
How to Avoid Unwanted Pregnancies: Domain Adaptation using Neural Network Models
2015cites this paper
Negation and Speculation Identification in Chinese Language
2015cites this paper