Neural Text Generation with Unlikelihood Training

S. Welleck,Ilia Kulikov,Stephen Roller,Emily Dinan,Kyunghyun Cho,J. Weston

Published 2019 in International Conference on Learning Representations

ABSTRACT

Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some post-hoc fixes have been proposed, in particular top-$k$ and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the model are poor. In this paper we show that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution. We propose a new objective, unlikelihood training, which forces unlikely generations to be assigned lower probability by the model. We show that both token and sequence level unlikelihood training give less repetitive, less dull text while maintaining perplexity, giving superior generations using standard greedy or beam search. According to human evaluations, our approach with standard beam search also outperforms the currently popular decoding methods of nucleus sampling or beam blocking, thus providing a strong alternative to existing techniques.

PUBLICATION RECORD

Publication year
2019
Venue
International Conference on Learning Representations
Publication date
2019-08-12
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1908.04319
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

Standard likelihood training assigns too much probability to sequences containing repeats and frequent words, unlike the human training distribution.
Confidence 0.85

AK (4715169a40) extraction
Unlikelihood training with standard beam search outperforms nucleus sampling and beam blocking according to human evaluations.
Confidence 0.90

AK (4715169a40) extraction
Both token-level and sequence-level unlikelihood training produce less repetitive, less dull text while maintaining perplexity, using standard greedy or beam search decoding.
Confidence 0.90

AK (4715169a40) extraction

CONCEPTS

beam blocking
decoding method, baseline

A decoding baseline that is outperformed by unlikelihood training with standard beam search.

AK (4715169a40) extraction
beam search
decoding method

A standard decoding method that benefits from unlikelihood training in this paper.

Aliases: greedy search

AK (4715169a40) extraction
likelihood training
method, baseline

The baseline objective blamed for overproducing repetitive and overly likely text sequences.

Aliases: standard likelihood training

AK (4715169a40) extraction
neural text generation
task

The task of producing open-ended text that this paper aims to make less dull and repetitive.

Aliases: text generation

AK (4715169a40) extraction
nucleus sampling
method, decoding strategy

A sampling-based decoding strategy used as a baseline for open-ended neural text generation.

Aliases: top-p sampling, dynamic nucleus sampling

AK (4715169a40) extraction
perplexity
metric

A language modeling metric that measures how well a model predicts the next token in a text sequence.

AK (4715169a40) extraction
unlikelihood training
method

A training objective that lowers probability on undesirable generations such as repetitions.

Aliases: unlikelihood objective

AK (4715169a40) extraction

REFERENCES

What makes a good conversation? How controllable attributes affect human judgments
2019cited by this paper
The Second Conversational Intelligence Challenge (ConvAI2)
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019influential reference
The Curious Case of Neural Text Degeneration
2019influential reference
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
2019cited by this paper
Negative Training for Neural Dialogue Response Generation
2019cited by this paper
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
2019influential reference
Importance of Search and Evaluation Strategies in Neural Dialogue Modeling
2018cited by this paper
Personalizing Dialogue Agents: I have a dog, do you have pets too?
2018cited by this paper
Diverse Beam Search for Improved Description of Complex Scenes
2018cited by this paper
Learning to Write with Cooperative Discriminators
2018cited by this paper
Hierarchical Neural Story Generation
2018cited by this paper
Towards Controllable Story Generation
2018cited by this paper
Retrieve and Refine: Improved Sequence Generation Models For Dialogue
2018cited by this paper
Adaptive Input Representations for Neural Language Modeling
2018influential reference
Importance of a Search Strategy in Neural Dialogue Modelling
2018cited by this paper
Learning with Reflective Likelihoods
2018cited by this paper
Attention is All you Need
2017cited by this paper
Controllable Abstractive Summarization
2017cited by this paper
Classical Structured Prediction Losses for Sequence to Sequence Learning
2017cited by this paper
Generating Sentences by Editing Prototypes
2017cited by this paper
A Deep Reinforced Model for Abstractive Summarization
2017cited by this paper
OpenNMT: Open-Source Toolkit for Neural Machine Translation
2017cited by this paper
Controlling Output Length in Neural Encoder-Decoders
2016cited by this paper
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
2016cited by this paper
Pointer Sentinel Mixture Models
2016cited by this paper
A Simple, Fast Diverse Decoding Algorithm for Neural Generation
2016cited by this paper
Minimum Risk Training for Neural Machine Translation
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
2010cited by this paper
Search-based structured prediction
2009cited by this paper
A Tutorial on Energy-Based Learning
2006cited by this paper
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
2002cited by this paper

CITED BY

Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models
2026cites this paper
Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure
2026cites this paper
Temporal Guidance for Large Language Models
2026cites this paper
Quantifying Generation Quality for RoPE-Based Long Context Extrapolation
2026cites this paper
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
2026cites this paper
PACE: Defying the Scaling Hypothesis of Exploration in Iterative Alignment for Mathematical Reasoning
2026cites this paper
Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing
2026cites this paper
Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective
2026cites this paper
What LLMs Think When You Don't Tell Them What to Think About?
2026cites this paper
Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron
2026cites this paper
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
2026influential citation
Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models
2026cites this paper
DiffusionRollout: Uncertainty-Aware Rollout Planning in Long-Horizon PDE Solving
2026cites this paper
VideoSTF: Stress-Testing Output Repetition in Video Large Language Models
2026cites this paper
Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models
2026cites this paper
Controlling Repetition in Protein Language Models
2026cites this paper
Hallucination Detection and Mitigation in Large Language Models
2026cites this paper
Enhancing Faithfulness in Abstractive Summarization via Span-Level Fine-Tuning
2025cites this paper
Large Language Models Hallucination: A Comprehensive Survey
2025cites this paper
Multi-Reward GRPO Fine-Tuning for De-biasing Large Language Models: A Study Based on Chinese-Context Discrimination Data
2025cites this paper
Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation
2025cites this paper
RESTRAIN: From Spurious Votes to Signals - Self-Driven RL with Self-Penalization
2025cites this paper
A Survey on Human Preference Learning for Aligning Large Language Models
2025cites this paper
Wait, Wait, Wait... Why Do Reasoning Models Loop?
2025cites this paper
DVAGen: Dynamic Vocabulary Augmented Generation
2025cites this paper
Explaining novel senses using definition generation with open language models
2025cites this paper
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters
2025cites this paper
Jointly Reinforcing Diversity and Quality in Language Model Generations
2025cites this paper
In-Context Learning Without Copying
2025cites this paper
Breaking the Likelihood Trap: Consistent Generative Recommendation with Graph-structured Model
2025cites this paper
From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs
2025cites this paper
Avoidance Decoding for Diverse Multi-Branch Story Generation
2025cites this paper
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
2025cites this paper
Protein Design with Dynamic Protein Vocabulary
2025cites this paper
On-Policy RL with Optimal Reward Baseline
2025cites this paper
DPL: Diverse Preference Learning Without A Reference Model
2025cites this paper
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
2025cites this paper
GSPT-CVAE: A New Controlled Long Text Generation Method Based on T-CVAE
2025cites this paper
GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval
2025cites this paper
A Survey of LLM Inference Systems
2025cites this paper
DYNTEXT: Semantic-Aware Dynamic Text Sanitization for Privacy-Preserving LLM Inference
2025cites this paper
Teaching Large Language Models to Reason through Learning and Forgetting
2025cites this paper
Building accurate translation-tailored large language models with language-aware instruction tuning
2025cites this paper
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
2025cites this paper
Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation
2025cites this paper
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
2025cites this paper
Surprisal reveals diversity gaps in image captioning and different scorers change the story
2025cites this paper
On the Entropy Calibration of Language Models
2025cites this paper
Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions
2025cites this paper
Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains
2025cites this paper
RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language Models
2025cites this paper
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
2025cites this paper
Diversity Question Generation with Desired Information Selection and Training Data Augment
2025cites this paper
“Desired behaviors”: alignment and the emergence of a machine learning ethics
2025cites this paper
Perception in Reflection
2025cites this paper
Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
2025cites this paper
Use of Transfer Learning for Affordable In-Context Fake Review Generation
2025cites this paper
BRIDO: Bringing Democratic Order to Abstractive Summarization
2025cites this paper
Offline Learning and Forgetting for Reasoning with Large Language Models
2025cites this paper
A Content-Preserving Secure Linguistic Steganography
2025cites this paper
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
2025cites this paper
A Survey on Unlearning in Large Language Models
2025cites this paper
MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
2025cites this paper
When Evolution Strategy Meets Language Models Tuning
2025influential citation
The Differences Between Direct Alignment Algorithms are a Blur
2025cites this paper
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
2025cites this paper
Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking
2025cites this paper
How much do LLMs learn from negative examples?
2025cites this paper
(G)I-DLE: Generative Inference via Distribution-preserving Logit Exclusion with KL Divergence Minimization for Constrained Decoding
2025cites this paper
Can Large Language Models Match Tutoring System Adaptivity? A Benchmarking Study
2025cites this paper
Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins
2025influential citation
Transformers to the rescue: alleviating data scarcity in arabic grammatical error correction with pre-trained models
2025cites this paper
Context-Enhanced Contrastive Search for Improved LLM Text Generation
2025cites this paper
Teaching Models to Understand (but not Generate) High-risk Data
2025cites this paper
Alleviating Chinese repetitive generation via intra and intersentence penalty
2025cites this paper
Style2Code: A Style-Controllable Code Generation Framework with Dual-Modal Contrastive Representation Learning
2025cites this paper
KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing
2025cites this paper
PDFBench: A Benchmark for De novo Protein Design from Function
2025cites this paper
Watermark Smoothing Attacks against Language Models
2025cites this paper
Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs
2025cites this paper
Negative chemical data boosts language models in reaction outcome prediction
2025cites this paper
Self-Explaining Counterfactual Data Augmentation for NLP
2025cites this paper
Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models
2025cites this paper
Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
2025cites this paper
Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities
2025cites this paper
Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
2025cites this paper
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
2025cites this paper
Contextual Contrastive Search for Improved Text Generations in Large Language Models
2025cites this paper
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
2025cites this paper
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
2025cites this paper
Mitigating Knowledge Conflicts in Language Model-Driven Question Answering
2024cites this paper
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation
2024cites this paper
SYSTRAN @ WMT24 Non-Repetitive Translation Task
2024cites this paper
An Empirical Study of Multilingual Reasoning Distillation for Question Answering
2024cites this paper
Teaching Models to Improve on Tape
2024cites this paper
A Metric-Based Detection System for Large Language Model Texts
2024cites this paper
Findings of the WMT 2024 Shared Task on Non-Repetitive Translation
2024cites this paper
UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
2024cites this paper
Self-calibration for Language Model Quantization and Pruning
2024cites this paper
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
2024influential citation