Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

Philipp Koehn,Vishrav Chaudhary,Ahmed El-Kishky,Naman Goyal,Peng-Jen Chen,Francisco (Paco) Guzmán

Published 2020 in Conference on Machine Translation

ABSTRACT

Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems. This year, the task tackled the low resource condition of Pashto–English and Khmer–English and also included the challenge of sentence alignment from document pairs.

PUBLICATION RECORD

  • Publication year

    2020

  • Venue

    Conference on Machine Translation

  • Publication date

    Unknown publication date

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-62 of 62 references · Page 1 of 1

CITED BY

Showing 1-79 of 79 citing papers · Page 1 of 1