Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolingual evidence to construct features that rescore existing translation candidates. In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. The proposed technique first constructs phrase graphs using both source and target language monolingual corpora. Next, graph propagation identifies translations of phrases that were not observed in the bilingual corpus, assuming that similar phrases have similar translations. We report results on a large Arabic-English system and a medium-sized Urdu-English system. Our proposed approach significantly improves the performance of competitive phrasebased systems, leading to consistent improvements between 1 and 4 BLEU points on standard evaluation sets.
Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
Avneesh Singh Saluja,Hany Hassan,Kristina Toutanova,Chris Quirk
Published 2014 in Annual Meeting of the Association for Computational Linguistics
ABSTRACT
PUBLICATION RECORD
- Publication year
2014
- Venue
Annual Meeting of the Association for Computational Linguistics
- Publication date
2014-06-23
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-27 of 27 references · Page 1 of 1
CITED BY
Showing 1-31 of 31 citing papers · Page 1 of 1