Copied Monolingual Data Improves Low-Resource Neural Machine Translation

Antonio Valerio Miceli Barone,Kenneth Heafield

Published 2017 in Conference on Machine Translation

ABSTRACT

We train a neural machine translation (NMT) system to both translate source-language text and copy target-language text, thereby exploiting monolingual corpora in the target language. Speciﬁcally, we create a bitext from the monolingual text in the target language so that each source sentence is identical to the target sentence. This copied data is then mixed with the parallel corpus and the NMT system is trained like normal, with no metadata to distinguish the two input languages. Our proposed method proves to be an effective way of incorporating monolingual data into low-resource NMT. see gains of up to 1.2 BLEU over a strong baseline with back-translation. Further analysis shows that the linguis-tic phenomena behind these gains are different from and largely orthogonal to back-translation, with our copied corpus method improving accuracy on named entities and other words that should remain identical between the source and target languages.

PUBLICATION RECORD

Publication year
2017
Venue
Conference on Machine Translation
Publication date
Unknown publication date
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/W17-4715
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Nematus: a Toolkit for Neural Machine Translation
2017cited by this paper
Semi-Supervised Learning for Neural Machine Translation
2016cited by this paper
Dual Learning for Machine Translation
2016cited by this paper
Findings of the 2016 Conference on Machine Translation
2016cited by this paper
Edinburgh Neural Machine Translation Systems for WMT 16
2016influential reference
Exploiting Source-side Monolingual Data in Neural Machine Translation
2016cited by this paper
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
2016cited by this paper
Multi-task Sequence to Sequence Learning
2015cited by this paper
Improving Neural Machine Translation Models with Monolingual Data
2015influential reference
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2015cited by this paper
Montreal Neural Machine Translation Systems for WMT’15
2015cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015influential reference
On Using Monolingual Corpora in Neural Machine Translation
2015influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Statistical Machine Translation
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Recurrent Continuous Translation Models
2013cited by this paper
KenLM: Faster and Smaller Language Model Queries
2011cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper

CITED BY

Development and Evaluation of an English-to Igala Neural Machine Translation System using Deep Learning
2025cites this paper
An STE-Guided Machine Translation Method based on Evidence Theory
2025cites this paper
Improving Retrieval-Augmented Neural Machine Translation with Monolingual Data
2025cites this paper
Investigating the Effect of Backtranslation for Indic Languages
2025cites this paper
Character-Level Encoding based Neural Machine Translation for Hindi language
2025cites this paper
Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation
2024cites this paper
KpopMT: Translation Dataset with Terminology for Kpop Fandom
2024cites this paper
Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
2024cites this paper
A Reinforcement Learning Approach to Improve Low-Resource Machine Translation Leveraging Domain Monolingual Data
2024cites this paper
Revitalizing Bahnaric Language through Neural Machine Translation: Challenges, Strategies, and Promising Outcomes
2024cites this paper
Research on Methods to Enhance Machine Translation Quality Between Low-Resource Languages and Chinese Based on ChatGPT
2024cites this paper
Towards Guided Back-translation for Low-resource languages- A Case Study on Kabyle-French
2024cites this paper
Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec
2023cites this paper
Neural Machine Translation for Code Generation
2023cites this paper
Neural Machine Translation: A Survey of Methods used for Low Resource Languages
2023cites this paper
A Data Augmentation Method for English-Vietnamese Neural Machine Translation
2023cites this paper
Exploring Data Augmentation for Code Generation Tasks
2023cites this paper
Impacts of Approaches for Agglutinative-LRL Neural Machine Translation (NMT): A Case Study on Manipuri-English Pair
2023cites this paper
On Synthetic Data for Back Translation
2023cites this paper
BaNaVA: A cross-platform AI mobile application for preserving the Bahnaric languages
2023cites this paper
SelectNoise: Unsupervised Noise Injection to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages
2023cites this paper
Construction of an Online Cloud Platform for Zhuang Speech Recognition and Translation with Edge-Computing-Based Deep Learning Algorithm
2023cites this paper
Data Augmentation with Diversified Rephrasing for Low-Resource Neural Machine Translation
2023cites this paper
Machine Translation of Electrical Terminology Constraints
2023cites this paper
Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task
2023cites this paper
Findings of the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages
2023cites this paper
Neural Machine Translation Methods for Translating Text to Sign Language Glosses
2023cites this paper
Code Generation from Natural Language Using Two-Way Pre-Training
2023cites this paper
Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation
2022cites this paper
End-To-End Training of Back-Translation Framework with Categorical Reparameterization Trick
2022cites this paper
CHIA: CHoosing Instances to Annotate for Machine Translation
2022cites this paper
Iterative Constrained Back-Translation for Unsupervised Domain Adaptation of Machine Translation
2022cites this paper
Penalizing Divergence: Multi-Parallel Translation for Low-Resource Languages of North America
2022cites this paper
Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages
2022cites this paper
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French
2022cites this paper
Unsupervised Domain Adaptation for Question Generation with DomainData Selection and Self-training
2022cites this paper
NECAT-CLWE: A S IMPLE B UT E FFICIENT P ARALLEL D ATA G ENERATION A PPROACH FOR L OW R ESOURCE N EURAL M ACHINE T RANSLATION
2022influential citation
Lack of Fluency is Hurting Your Translation Model
2022cites this paper
Low-resource Neural Machine Translation: Methods and Trends
2022cites this paper
The impact of lexical and grammatical processing on generating code from natural language
2022influential citation
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
2021cites this paper
Token-wise Curriculum Learning for Neural Machine Translation
2021cites this paper
Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation
2021influential citation
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts
2021cites this paper
The University of Edinburgh’s Bengali-Hindi Submissions to the WMT21 News Translation Task
2021cites this paper
DEEP: DEnoising Entity Pre-training for Neural Machine Translation
2021cites this paper
A comparative study of neural machine translation models for Turkish language
2021cites this paper
Domain Adaptation for Hindi-Telugu Machine Translation Using Domain Specific Back Translation
2021cites this paper
Backtranslation in Neural Morphological Inflection
2021cites this paper
Recent advances of low-resource neural machine translation
2021cites this paper
Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC
2021cites this paper
Data augmentation for low‐resource languages NMT guided by constrained sampling
2021influential citation
Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach
2021cites this paper
Don’t Go Far Off: An Empirical Study on Neural Poetry Translation
2021cites this paper
Survey of Low-Resource Machine Translation
2021cites this paper
Code Generation from Natural Language with Less Prior Knowledge and More Monolingual Data
2021influential citation
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages
2021cites this paper
The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation
2021cites this paper
Neural Machine Translation for Low-resource Languages: A Survey
2021cites this paper
Machine Translation into Low-resource Language Varieties
2021cites this paper
Enhancing Cherokee-English Translation System
2021cites this paper
Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
2021cites this paper
Counterfactual Data Augmentation for Neural Machine Translation
2021cites this paper
Data Augmentation for Sign Language Gloss Translation
2021cites this paper
Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation
2021cites this paper
Meta Back-translation
2021cites this paper
The Usefulness of Bibles in Low-Resource Machine Translation
2021cites this paper
A Survey on Low-Resource Neural Machine Translation
2021cites this paper
Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation
2021cites this paper
GX@DravidianLangTech-EACL2021: Multilingual Neural Machine Translation and Back-translation
2021cites this paper
Neural Data-to-Text Generation with LM-based Text Augmentation
2021cites this paper
Semantic Parsing with Less Prior and More Monolingual Data
2021influential citation
Strengthening Low-resource Neural Machine Translation through Joint Learning: The Case of Farsi-Spanish
2021cites this paper
MENYO-20k: A Multi-domain English-Yorùbá Corpus for Machine Translation and Domain Adaptation
2021cites this paper
Comparing Statistical and Neural Machine Translation Performance on Hindi-To-Tamil and English-To-Tamil
2020cites this paper
Neural Machine Translation
2020cites this paper
Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation
2020cites this paper
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
2020cites this paper
Dictionary-based Data Augmentation for Cross-Domain Neural Machine Translation
2020cites this paper
Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
2020cites this paper
Facilitating Access to Multilingual COVID-19 Information via Neural Machine Translation
2020influential citation
In Neural Machine Translation, What Does Transfer Learning Transfer?
2020cites this paper
A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
2020cites this paper
Neural Machine Translation Using Multiple Back-translation Generated by Sampling
2020cites this paper
Using Self-Training to Improve Back-Translation in Low Resource Neural Machine Translation
2020cites this paper
A Simple Baseline to Semi-Supervised Domain Adaptation for Machine Translation
2020influential citation
A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation
2020cites this paper
Character Mapping and Ad-hoc Adaptation: Edinburgh’s IWSLT 2020 Open Domain Translation System
2020cites this paper
Multi-Source Neural Model for Machine Translation of Agglutinative Language
2020cites this paper
Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation
2020cites this paper
Multi-Task Neural Model for Agglutinative Language Translation
2020cites this paper
A Survey of Domain Adaptation for Machine Translation
2020cites this paper
Rapid Development of Competitive Translation Engines for Access to Multilingual COVID-19 Information
2020influential citation
A Study on the Performance Improvement of Machine Translation Using Public Korean-English Parallel Corpus
2020cites this paper
Iterative Domain-Repaired Back-Translation
2020influential citation
Decoding Strategies for Improving Low-Resource Machine Translation
2020cites this paper
slimIPL: Language-Model-Free Iterative Pseudo-Labeling
2020cites this paper
A More Comprehensive Method for Using The Target-side Monolingual Data to Improve Low Resource Neural Machine Translation.
2020cites this paper
Investigating Low-resource Machine Translation for English-to-Tamil
2020cites this paper
An Error-based Investigation of Statistical and Neural Machine Translation Performance on Hindi-to-Tamil and English-to-Tamil
2020cites this paper