Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Published 2018 in Conference on Machine Translation

ABSTRACT

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.

PUBLICATION RECORD

Publication year
2018
Venue
Conference on Machine Translation
Publication date
2018-09-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/W18-6327 arXiv 1809.00252
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?
2018cited by this paper
Universal Neural Machine Translation for Extremely Low Resource Languages
2018cited by this paper
Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach
2018cited by this paper
Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation
2018cited by this paper
What do Neural Machine Translation Models Learn about Morphology?
2017cited by this paper
Regularizing Neural Networks by Penalizing Confident Output Distributions
2017cited by this paper
Automatic differentiation in PyTorch
2017cited by this paper
Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning
2017cited by this paper
Attention is All you Need
2017influential reference
Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder
2016cited by this paper
Transfer Learning for Low-Resource Neural Machine Translation
2016cited by this paper
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
2016cited by this paper
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
2016cited by this paper
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
Layer Normalization
2016cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Multi-Task Learning for Multiple Language Translation
2015influential reference
Deep Residual Learning for Image Recognition
2015cited by this paper
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
Multi-task Sequence to Sequence Learning
2015cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Efficient BackProp
2012cited by this paper
Deep Sparse Rectifier Neural Networks
2011cited by this paper
Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis
2011cited by this paper
Natural Language Processing (Almost) from Scratch
2011cited by this paper
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
2005cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Long Short-Term Memory
1997cited by this paper
Multitask Learning
1997cited by this paper
A new algorithm for data compression
1994cited by this paper

CITED BY

TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity
2026cites this paper
Krey-All WMT 2025 CreoleMT System Description: Language Agnostic Strategies for Low-Resource Translation
2025cites this paper
Exploring Intrinsic Language-specific Subspaces in Fine-tuning Multilingual Neural Machine Translation
2024cites this paper
ShareBERT: Embeddings Are Capable of Learning Hidden Layers
2024cites this paper
Exploring Domain-shared and Domain-specific Knowledge in Multi-Domain Neural Machine Translation
2023cites this paper
Towards a Deep Understanding of Multilingual End-to-End Speech Translation
2023cites this paper
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
2023cites this paper
Towards English-centric Zero-shot Neural Machine Translation: The Analysis and Solution
2023cites this paper
Toward More Human-Like AI Communication: A Review of Emergent Communication Research
2023cites this paper
Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction
2023cites this paper
Transformer: A General Framework from Machine Translation to Others
2023cites this paper
TPDM: Selectively Removing Positional Information for Zero-shot Translation via Token-Level Position Disentangle Module
2023cites this paper
Exploring Geometric Representational Disparities between Multilingual and Bilingual Translation Models
2023cites this paper
Dozens of Translation Directions or Millions of Shared Parameters? Comparing Two Types of Multilinguality in Modular Machine Translation
2023cites this paper
Learning Language-Specific Layers for Multilingual Machine Translation
2023cites this paper
Exploring Representational Disparities Between Multilingual and Bilingual Translation Models
2023cites this paper
An Ensemble Strategy with Gradient Conflict for Multi-Domain Neural Machine Translation
2023influential citation
Improving Chinese-Centric Low-Resource Translation Using English-Centric Pivoted Parallel Data
2023cites this paper
Gradient-based Gradual Pruning for Language-Specific Multilingual Neural Machine Translation
2023cites this paper
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation
2022cites this paper
Better Pre-Training by Reducing Representation Confusion
2022cites this paper
Multilingual Machine Translation with Hyper-Adapters
2022cites this paper
Amazon Alexa AI’s System for IWSLT 2022 Offline Speech Translation Shared Task
2022cites this paper
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
2022cites this paper
Adaptive Token-level Cross-lingual Feature Mixing for Multilingual Neural Machine Translation
2022cites this paper
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
2022cites this paper
An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
2022cites this paper
Addressing Asymmetry in Multilingual Neural Machine Translation with Fuzzy Task Clustering
2022cites this paper
Language Branch Gated Multilingual Neural Machine Translation
2022cites this paper
Improve Transformer Pre-Training with Decoupled Directional Relative Position Encoding and Representation Differentiations
2022cites this paper
Searching for Effective Multilingual Fine-Tuning Methods: A Case Study in Summarization
2022cites this paper
Language-Family Adapters for Multilingual Neural Machine Translation
2022cites this paper
An Empirical Study of Automatic Post-Editing
2022cites this paper
Investigating Parameter Sharing in Multilingual Speech Translation
2022influential citation
Adapting to Non-Centered Languages for Zero-shot Multilingual Translation
2022cites this paper
Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model
2022influential citation
Informative Language Representation Learning for Massively Multilingual Neural Machine Translation
2022cites this paper
Target-Oriented Knowledge Distillation with Language-Family-Based Grouping for Multilingual NMT
2022cites this paper
Synchronous Inference for Multilingual Neural Machine Translation
2022cites this paper
Lego-MT: Towards Detachable Models in Massively Multilingual Machine Translation
2022cites this paper
Multiple Captions Embellished Multilingual Multi-Modal Neural Machine Translation
2021cites this paper
A Survey on Low-Resource Neural Machine Translation
2021cites this paper
Cross-lingual learning for text processing: A survey
2021cites this paper
Improving Zero-shot Neural Machine Translation on Language-specific Encoders- Decoders
2021cites this paper
Correcting Momentum with Second-order Information
2021cites this paper
Hierarchical Transformer for Multilingual Machine Translation
2021cites this paper
Adaptive Sparse Transformer for Multilingual Translation
2021cites this paper
Demystify Optimization Challenges in Multilingual Transformers
2021cites this paper
Learning Language Specific Sub-network for Multilingual Machine Translation
2021cites this paper
Breaking Down Multilingual Machine Translation
2021cites this paper
Robust Optimization for Multilingual Translation with Imbalanced Data
2021cites this paper
Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling
2021cites this paper
Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
2021cites this paper
Benchmarking Differential Privacy and Federated Learning for BERT Models
2021cites this paper
Neural Machine Translation for Low-resource Languages: A Survey
2021influential citation
Deep Neural Transformer Model for Mono and Multi Lingual Machine Translation
2021cites this paper
Importance-based Neuron Allocation for Multilingual Neural Machine Translation
2021cites this paper
Modeling Task-Aware MIMO Cardinality for Efficient Multilingual Neural Machine Translation
2021cites this paper
mTVR: Multilingual Moment Retrieval in Videos
2021cites this paper
Multilingual Translation from Denoising Pre-Training
2021cites this paper
Multilingual Simultaneous Neural Machine Translation
2021influential citation
基于模型不确定性约束的半监督汉缅神经机器翻译(Semi-Supervised Chinese-Myanmar Neural Machine Translation based Model-Uncertainty)
2021cites this paper
Improving Multilingual Translation by Representation and Gradient Regularization
2021cites this paper
Neural machine translation: past, present, and future
2021cites this paper
Improving Multilingual Neural Machine Translation with Auxiliary Source Languages
2021cites this paper
Parameter Differentiation based Multilingual Neural Machine Translation
2021cites this paper
Multi-Lingual Machine Translation Ph.D. Thesis Proposal
2021cites this paper
Modeling SQL Statement Correctness with Attention-Based Convolutional Neural Networks
2021cites this paper
Preface
2020cites this paper
Revisiting Words
2020cites this paper
Index
2020cites this paper
Revisiting Modularized Multilingual NMT to Meet Industrial Demands
2020cites this paper
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
2020cites this paper
A New Approach to Parameter-Sharing in Multilingual Neural Machine Translation
2020cites this paper
On the Importance of Local Information in Transformer Based Models
2020cites this paper
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
2020cites this paper
An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
2020cites this paper
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
2020cites this paper
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-view Language Representations
2020cites this paper
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
2020influential citation
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
2020cites this paper
Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
2020cites this paper
Linguistic Structure
2020cites this paper
A Comprehensive Survey of Multilingual Neural Machine Translation
2020influential citation
Analysis and Visualization
2020cites this paper
Computation Graphs
2020cites this paper
Beyond Parallel Corpora
2020cites this paper
Bibliography
2020cites this paper
Alternate Architectures
2020cites this paper
Current Challenges
2020cites this paper
Machine Learning Tricks
2020cites this paper
Neural Translation Models
2020cites this paper
Uses of Machine Translation
2020cites this paper
Neural Language Models
2020cites this paper
The Translation Problem
2020cites this paper
Neural Machine Translation
2020cites this paper
Multilingual Neural Machine Translation
2020cites this paper
Stronger Transformers for Neural Multi-Hop Question Generation
2020cites this paper
A Study on Multilingual Transfer Learning in Neural Machine Translation: Finding the Balance Between Languages
2019cites this paper
Learning to Reuse Translations: Guiding Neural Machine Translation with Examples
2019cites this paper