Neural Networks Classifier for Data Selection in Statistical Machine Translation

Álvaro Peris,Mara Chinea-Rios,F. Casacuberta

Published 2016 in Prague Bulletin of Mathematical Linguistics

ABSTRACT

Abstract Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-37 of 37 references · Page 1 of 1

CITED BY

Showing 1-17 of 17 citing papers · Page 1 of 1