Discriminative Corpus Weight Estimation for Machine Translation

S. Matsoukas,Antti-Veikko I. Rosti,Bing Zhang

Published 2009 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Current statistical machine translation (SMT) systems are trained on sentence-aligned and word-aligned parallel text collected from various sources. Translation model parameters are estimated from the word alignments, and the quality of the translations on a given test set depends on the parameter estimates. There are at least two factors affecting the parameter estimation: domain match and training data quality. This paper describes a novel approach for automatically detecting and down-weighing certain parts of the training corpus by assigning a weight to each sentence in the training bitext so as to optimize a discriminative objective function on a designated tuning set. This way, the proposed method can limit the negative effects of low quality training data, and can adapt the translation model to the domain of interest. It is shown that such discriminative corpus weights can provide significant improvements in Arabic-English translation on various conditions, using a state-of-the-art SMT system.

PUBLICATION RECORD

  • Publication year

    2009

  • Venue

    Conference on Empirical Methods in Natural Language Processing

  • Publication date

    2009-08-06

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-20 of 20 references · Page 1 of 1

CITED BY

Showing 1-100 of 126 citing papers · Page 1 of 2