Mutual Information and Diverse Decoding Improve Neural Machine Translation

Published 2016 in arXiv.org

ABSTRACT

Sequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target given the source, i.e., $p(y|x)$, an objective that ignores other potentially useful sources of information. We introduce an alternative objective function for neural MT that maximizes the mutual information between the source and target sentences, modeling the bi-directional dependency of sources and targets. We implement the model with a simple re-ranking method, and also introduce a decoding algorithm that increases diversity in the N-best list produced by the first pass. Applied to the WMT German/English and French/English tasks, the proposed models offers a consistent performance boost on both standard LSTM and attention-based neural MT architectures.

PUBLICATION RECORD

Publication year
2016
Venue
arXiv.org
Publication date
2016-01-04
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 1601.00372
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
2015cited by this paper
Improving Neural Machine Translation Models with Monolingual Data
2015cited by this paper
A Survey of Available Corpora for Building Data-Driven Dialogue Systems
2015cited by this paper
A Hierarchical Neural Autoencoder for Paragraphs and Documents
2015influential reference
On Using Monolingual Corpora in Neural Machine Translation
2015influential reference
Encoding Source Language with Convolutional Neural Network for Machine Translation
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
A Neural Conversational Model
2015cited by this paper
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
2015cited by this paper
A Diversity-Promoting Objective Function for Neural Conversation Models
2015influential reference
Effective Approaches to Attention-based Neural Machine Translation
2015influential reference
Grammar as a Foreign Language
2014cited by this paper
Addressing the Rare Word Problem in Neural Machine Translation
2014influential reference
On Using Very Large Target Vocabulary for Neural Machine Translation
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014influential reference
N-gram Counts and Language Models from the Common Crawl
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Recurrent Continuous Translation Models
2013cited by this paper
Bagging and Boosting statistical machine translation systems
2013cited by this paper
A Systematic Exploration of Diversity in Machine Translation
2013cited by this paper
Positive Diversity Tuning for Machine Translation System Combination
2013cited by this paper
Diverse M-Best Solutions in Markov Random Fields
2012cited by this paper
Trait-Based Hypothesis Selection For Machine Translation
2012cited by this paper
String-to-Dependency Statistical Machine Translation
2010influential reference
Lattice-based Minimum Error Rate Training for Statistical Machine Translation
2008cited by this paper
Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation
2008cited by this paper
Forest Reranking: Discriminative Parsing with Non-Local Features
2008cited by this paper
Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines
2006cited by this paper
Minimum Bayes-Risk Decoding for Statistical Machine Translation
2004cited by this paper
Minimum Error Rate Training in Statistical Machine Translation
2003influential reference
Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
2002influential reference
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Large scale discriminative training of hidden Markov models for speech recognition
2002cited by this paper
Long Short-Term Memory
1997cited by this paper
Maximum mutual information estimation of hidden Markov model parameters for speech recognition
1986cited by this paper

CITED BY

Controlled beam search for neural machine translation using subword units leveraging phrase-based statistical machine translation outputs
2026cites this paper
ETS: Efficient Tree Search for Inference-Time Scaling
2025cites this paper
Multi-Amateur Contrastive Decoding for Text Generation
2025cites this paper
Jointly Reinforcing Diversity and Quality in Language Model Generations
2025cites this paper
Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models
2025cites this paper
SEK: Self-Explained Keywords Empower Large Language Models for Code Generation
2025cites this paper
Pragmatic Reasoning improves LLM Code Generation
2025cites this paper
Incentivizing Truthful Language Models via Peer Elicitation Games
2025cites this paper
Joint Inference of Retrieval and Generation for Passage Re-ranking
2024cites this paper
Metasql: A Generate-Then-Rank Framework for Natural Language to SQL Translation
2024cites this paper
Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding
2024cites this paper
Generating Diverse Translation with Perturbed kNN-MT
2024cites this paper
The Future of Human Translation in the Artificial Intelligence Era
2024cites this paper
Self-Explained Keywords Empower Large Language Models for Code Generation
2024cites this paper
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability
2024cites this paper
Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
2024cites this paper
Which Information Matters? Dissecting Human-written Multi-document Summaries with Partial Information Decomposition
2024cites this paper
Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
2024cites this paper
Multi-party Response Generation with Relation Disentanglement
2024cites this paper
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects - A Survey
2023cites this paper
Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings
2023cites this paper
Benchmarking and Improving Text-to-SQL Generation under Ambiguity
2023cites this paper
The Consensus Game: Language Model Generation via Equilibrium Search
2023cites this paper
VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores
2023cites this paper
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
2023cites this paper
Beam Search Optimized Batch Bayesian Active Learning
2023cites this paper
A Confidence-based Multipath Neural-symbolic Approach for Visual Question Answering
2023cites this paper
Revisiting the Role of Language Priors in Vision-Language Models
2023influential citation
Bidirectional Transformer Reranker for Grammatical Error Correction
2023cites this paper
A Multi-view Meta-learning Approach for Multi-modal Response Generation
2023influential citation
Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods
2023cites this paper
Coherent Visual Storytelling via Parallel Top-Down Visual and Topic Attention
2023cites this paper
Workload-Aware Query Recommendation Using Deep Learning
2023cites this paper
Self-Consistency Improves Chain of Thought Reasoning in Language Models
2022cites this paper
AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data
2022cites this paper
Prompt-Driven Neural Machine Translation
2022cites this paper
On Decoding Strategies for Neural Text Generators
2022cites this paper
Few-shot Subgoal Planning with Language Models
2022cites this paper
Polishing Network for Decoding of Higher-Quality Diverse Image Captions
2022cites this paper
There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It
2021cites this paper
Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
2021cites this paper
Faster Nearest Neighbor Machine Translation
2021cites this paper
ConRPG: Paraphrase Generation using Contexts as Regularizer
2021cites this paper
Reducing Length Bias in Scoring Neural Machine Translation via a Causal Inference Method
2021cites this paper
Sampling-Based Minimum Bayes Risk Decoding for Neural Machine Translation
2021cites this paper
RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
2021cites this paper
Adaptive Beam Search Decoding for Discrete Keyphrase Generation
2021cites this paper
REAM\sharp: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog Generation
2021cites this paper
Fast Nearest Neighbor Machine Translation
2021cites this paper
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
2021cites this paper
Cooperative Self-training of Machine Reading Comprehension
2021cites this paper
Generating unambiguous and diverse referring expressions
2021cites this paper
Neural Translation Models
2020cites this paper
Neural Machine Translation
2020cites this paper
Non-Autoregressive Neural Dialogue Generation
2020influential citation
Incremental Sampling Without Replacement for Sequence Models
2020cites this paper
Analysis of diversity-accuracy tradeoff in image captioning
2020cites this paper
Neural Machine Translation with Target-Attention Model
2020cites this paper
Mutual Information Maximization for Effective Lip Reading
2020cites this paper
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
2020cites this paper
Trading Off Diversity and Quality in Natural Language Generation
2020influential citation
On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation
2020cites this paper
Generating Diverse Translations via Weighted Fine-tuning and Hypotheses Filtering for the Duolingo STAPLE Task
2020cites this paper
Training and Inference Methods for High-Coverage Neural Machine Translation
2020cites this paper
Neural Language Generation: Formulation, Methods, and Evaluation
2020cites this paper
Towards Unique and Informative Captioning of Images
2020cites this paper
Open-Domain Dialogue Generation Based on Pre-trained Language Models
2020influential citation
Decoding and Diversity in Machine Translation
2020cites this paper
The Translation Problem
2020cites this paper
Uses of Machine Translation
2020cites this paper
Beyond Parallel Corpora
2020cites this paper
Comparison of Diverse Decoding Methods from Conditional Language Models
2019influential citation
Trainable Decoding of Sets of Sequences for Neural Sequence Models
2019cites this paper
Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation
2019cites this paper
Neural Machine Translation: A Review
2019cites this paper
Generating Diverse Story Continuations with Controllable Semantics
2019cites this paper
Do Massively Pretrained Language Models Make Better Storytellers?
2019cites this paper
BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase Generation
2019cites this paper
Black Box Recursive Translations for Molecular Optimization
2019cites this paper
Domain Robustness in Neural Machine Translation
2019cites this paper
Selecting Informative Context Sentence by Forced Back-Translation
2019influential citation
Latent Part-of-Speech Sequences for Neural Machine Translation
2019cites this paper
Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting
2019cites this paper
On the use of prior and external knowledge in neural sequence models
2019influential citation
Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing
2019cites this paper
Extended Answer and Uncertainty Aware Neural Question Generation
2019cites this paper
Hierarchical Clustering Based Band Selection Algorithm for Hyperspectral Face Recognition
2019cites this paper
Diverse Decoding for Abstractive Document Summarization
2019cites this paper
Towards a Neural Conversation Model With Diversity Net Using Determinantal Point Processes
2018influential citation
Diversity in Machine Learning
2018cites this paper
A Reranking Model for Korean Morphological Analysis Based on Sequence-to-Sequence Model
2018cites this paper
Neural Language Models
2018cites this paper
Neural Abstractive Text Summarization with Sequence-to-Sequence Models
2018cites this paper
Towards Less Generic Responses in Neural Conversation Models: A Statistical Re-weighting Method
2018cites this paper
Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks
2018cites this paper
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
2018cites this paper
Why Do Neural Response Generation Models Prefer Universal Replies?
2018cites this paper
Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation
2018cites this paper
Diverse Beam Search for Improved Description of Complex Scenes
2018influential citation
Analyzing Uncertainty in Neural Machine Translation
2018cites this paper