This paper evaluates the translation quality of machine translation systems for 8 language pairs: translating French, German, Spanish, and Czech to English and back. We carried out an extensive human evaluation which allowed us not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process. We measured timing and intra- and inter-annotator agreement for three types of subjective evaluation. We measured the correlation of automatic evaluation metrics with human judgments. This meta-evaluation reveals surprising facts about the most commonly used methodologies.
(Meta-) Evaluation of Machine Translation
Chris Callison-Burch,C. Fordyce,Philipp Koehn,Christof Monz,Josh Schroeder
Published 2007 in WMT@ACL
ABSTRACT
PUBLICATION RECORD
- Publication year
2007
- Venue
WMT@ACL
- Publication date
2007-06-23
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-51 of 51 references · Page 1 of 1