We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
Philipp Koehn,Huda Khayrallah,Kenneth Heafield,M. Forcada
Published 2018 in Conference on Machine Translation
ABSTRACT
PUBLICATION RECORD
- Publication year
2018
- Venue
Conference on Machine Translation
- Publication date
2018-10-31
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-39 of 39 references · Page 1 of 1