Abstract Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.
Neural Networks Classifier for Data Selection in Statistical Machine Translation
Álvaro Peris,Mara Chinea-Rios,F. Casacuberta
Published 2016 in Prague Bulletin of Mathematical Linguistics
ABSTRACT
PUBLICATION RECORD
- Publication year
2016
- Venue
Prague Bulletin of Mathematical Linguistics
- Publication date
2016-12-16
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-37 of 37 references · Page 1 of 1
CITED BY
Showing 1-17 of 17 citing papers · Page 1 of 1