This paper describes the University of Helsinki Language Technology group’s participation in the WMT 2019 parallel corpus filtering task. Our scores were produced using a two-step strategy. First, we individually applied a series of filters to remove the ‘bad’ quality sentences. Then, we produced scores for each sentence by weighting these features with a classification model. This methodology allowed us to build a simple and reliable system that is easily adaptable to other language pairs.
The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task
Raúl Vázquez,U. Sulubacak,J. Tiedemann
Published 2019 in Conference on Machine Translation
ABSTRACT
PUBLICATION RECORD
- Publication year
2019
- Venue
Conference on Machine Translation
- Publication date
2019-07-29
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-10 of 10 references · Page 1 of 1
CITED BY
Showing 1-11 of 11 citing papers · Page 1 of 1