Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model

Published 2016 in arXiv.org

ABSTRACT

Recent advances in conditional recurrent language modelling have mainly focused on network architectures (e.g., attention mechanism), learning algorithms (e.g., scheduled sampling and sequence-level training) and novel applications (e.g., image/video description generation, speech recognition, etc.) On the other hand, we notice that decoding algorithms/strategies have not been investigated as much, and it has become standard to use greedy or beam search. In this paper, we propose a novel decoding strategy motivated by an earlier observation that nonlinear hidden layers of a deep neural network stretch the data manifold. The proposed strategy is embarrassingly parallelizable without any communication overhead, while improving an existing decoding algorithm. We extensively evaluate it with attention-based neural machine translation on the task of En→Cz translation.

PUBLICATION RECORD

Publication year
2016
Venue
arXiv.org
Publication date
2016-05-12
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1605.03835
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Mutual Information and Diverse Decoding Improve Neural Machine Translation
2016influential reference
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
2016cited by this paper
Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies
2016cited by this paper
Task Loss Estimation for Sequence Prediction
2015cited by this paper
A Recurrent Latent Variable Model for Sequential Data
2015cited by this paper
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
From Feedforward to Recurrent LSTM Neural Networks for Language Modeling
2015cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
2015cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Perturb-and-MAP Random Fields: Reducing Random Sampling to Optimization, with Applications in Computer Vision
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Auto-Encoding Variational Bayes
2013cited by this paper
Statistical Language Models Based on Neural Networks
2012cited by this paper
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
Diverse M-Best Solutions in Markov Random Fields
2012cited by this paper
Better Mixing via Deep Representations
2012cited by this paper
Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models
2011cited by this paper
Generating Text with Recurrent Neural Networks
2011cited by this paper
Recurrent neural network based language model
2010cited by this paper
Influence of cultivation temperature on the ligninolytic activity of selected fungal strains
2006cited by this paper
A simple recursive numerical method for Bermudan option pricing under Lévy processes
2006cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
Foundations of Statistical Natural Language Processing
2001influential reference
Book Reviews: Foundations of Statistical Natural Language Processing
1999cited by this paper
Long Short-Term Memory
1997cited by this paper
Learning long-term dependencies with gradient descent is difficult
1994cited by this paper
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
1989cited by this paper
Learning representations by back-propagating errors
1986cited by this paper
Learning representations by back-propagation errors, nature
1986cited by this paper

CITED BY

Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models
2025cites this paper
Jointly Reinforcing Diversity and Quality in Language Model Generations
2025cites this paper
Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models
2025cites this paper
Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding
2024cites this paper
Neural Methods for Data-to-text Generation
2024cites this paper
Priority Sampling of Large Language Models for Compilers
2024cites this paper
Generating Diverse Translation with Perturbed kNN-MT
2024cites this paper
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
2024cites this paper
Enhancing In-Context Learning via Implicit Demonstration Augmentation
2024cites this paper
QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation
2024cites this paper
Exploring Geometric Representational Disparities between Multilingual and Bilingual Translation Models
2023cites this paper
Natural language watermarking via paraphraser-based lexical substitution
2023cites this paper
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects - A Survey
2023cites this paper
AMLNet: Adversarial Mutual Learning Neural Network for Non-AutoRegressive Multi-Horizon Time Series Forecasting
2023cites this paper
Local Temperature Beam Search: Avoid Neural Text DeGeneration via Enhanced Calibration
2023cites this paper
Optimizing Non-Autoregressive Transformers with Contrastive Learning
2023cites this paper
ParaLS: Lexical Substitution via Pretrained Paraphraser
2023cites this paper
Exploiting Semantic and Syntactic Diversity for Diverse Task-oriented Dialogue
2022cites this paper
Reinforcement Learning for Practical Express Systems with Mixed Deliveries and Pickups
2022cites this paper
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
2022cites this paper
Innovations in Neural Data-to-text Generation
2022cites this paper
Adaptive Beam Search Decoding for Discrete Keyphrase Generation
2021cites this paper
Generating unambiguous and diverse referring expressions
2021cites this paper
Function-guided protein design by deep manifold sampling
2021cites this paper
Heavy-Tails and Randomized Restarting Beam Search in Goal-Oriented Neural Sequence Decoding
2021cites this paper
PROTAUGMENT: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning
2021cites this paper
Uses of Machine Translation
2020cites this paper
Neural Machine Translation
2020cites this paper
Non-Autoregressive Neural Dialogue Generation
2020cites this paper
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
2020cites this paper
Decoding As Dynamic Programming For Recurrent Autoregressive Models
2020cites this paper
Generating Diverse Translations via Weighted Fine-tuning and Hypotheses Filtering for the Duolingo STAPLE Task
2020cites this paper
Neural Language Generation: Formulation, Methods, and Evaluation
2020cites this paper
Dissecting the components and factors of Neural Text Generation
2020cites this paper
SMRTer Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multi-Reference Training
2020cites this paper
Decoding and Diversity in Machine Translation
2020cites this paper
The Translation Problem
2020cites this paper
Neural Translation Models
2020cites this paper
Beyond Parallel Corpora
2020cites this paper
Ours ( b ) Diverse Search 1 ) Diverse Content Selection 2 ) Focused Generation ( c ) Mixture Decoder Dec 1 Dec 2 Enc Dec 3
2019cites this paper
A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models
2019cites this paper
Can Unconditional Language Models Recover Arbitrary Sentences?
2019cites this paper
Comparison of Diverse Decoding Methods from Conditional Language Models
2019influential citation
Multi-Turn Beam Search for Neural Dialogue Modeling
2019cites this paper
Levenshtein Transformer
2019cites this paper
Insertion-based Decoding with Automatically Inferred Generation Order
2019cites this paper
Mixture Content Selection for Diverse Sequence Generation
2019cites this paper
On the use of prior and external knowledge in neural sequence models
2019influential citation
Neural Machine Translation: A Review
2019cites this paper
SIC-GAN: A Self-Improving Collaborative GAN for Decoding Sketch RNNs
2018influential citation
Biological applications, visualizations, and extensions of the long short-term memory network
2018cites this paper
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
2018influential citation
Analyzing Uncertainty in Neural Machine Translation
2018cites this paper
Neural Language Models
2018cites this paper
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
2018cites this paper
SIC-GAN: A SELF-IMPROVING COLLABORATIVE GAN FOR DECODING VARIATIONAL RNNS
2017influential citation
Learning a Generative Model for Validity in Complex Discrete Structures
2017cites this paper
Non-Autoregressive Neural Machine Translation
2017cites this paper
Towards Decoding as Continuous Optimisation in Neural Machine Translation
2017cites this paper
Decoding as Continuous Optimization in Neural Machine Translation
2017cites this paper
Learning to Decode for Future Success
2017cites this paper
Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation
2017cites this paper
Trainable Greedy Decoding for Neural Machine Translation
2017influential citation
A Simple, Fast Diverse Decoding Algorithm for Neural Generation
2016cites this paper
Neural Combinatorial Optimization with Reinforcement Learning
2016cites this paper