Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.
Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions
Philipp Koehn,Francisco (Paco) Guzmán,Vishrav Chaudhary,J. Pino
Published 2019 in Conference on Machine Translation
ABSTRACT
PUBLICATION RECORD
- Publication year
2019
- Venue
Conference on Machine Translation
- Publication date
Unknown publication date
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-54 of 54 references · Page 1 of 1
CITED BY
Showing 1-83 of 83 citing papers · Page 1 of 1