Deep Reinforcement Learning for Dialogue Generation

Jiwei Li,Will Monroe,Alan Ritter,Dan Jurafsky,Michel Galley,Jianfeng Gao

Published 2016 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity (non-repetitive turns), coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.

PUBLICATION RECORD

Publication year
2016
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2016-06-05
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/D16-1127 arXiv 1606.01541
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Incorporating Loose-Structured Knowledge into LSTM with Recall Gate for Conversation Modeling
2016cited by this paper
A Network-based End-to-End Trainable Task-oriented Dialogue System
2016cited by this paper
A Persona-Based Neural Conversation Model
2016influential reference
Continuously Learning Neural Dialogue Management
2016cited by this paper
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
2016cited by this paper
LSTM based Conversation Models
2016cited by this paper
Incorporating loose-structured knowledge into conversation modeling via recall-gate LSTM
2016cited by this paper
Mastering the game of Go with deep neural networks and tree search
2016cited by this paper
Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems
2015cited by this paper
A Neural Conversational Model
2015influential reference
A Survey of Available Corpora for Building Data-Driven Dialogue Systems
2015cited by this paper
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
2015cited by this paper
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
2015cited by this paper
Attention with Intention for a Neural Network Conversation Model
2015cited by this paper
Language Understanding for Text-based Games using Deep Reinforcement Learning
2015cited by this paper
Deep Reinforcement Learning with an Action Space Defined by Natural Language
2015cited by this paper
Hierarchical Neural Network Generative Models for Movie Dialogues
2015cited by this paper
Reinforcement Learning Neural Turing Machines - Revised
2015cited by this paper
Reinforcement Learning Neural Turing Machines
2015cited by this paper
Deep Reinforcement Learning with a Natural Language Action Space
2015cited by this paper
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
2015influential reference
RECURRENT NEURAL NETWORKS
2015cited by this paper
A Diversity-Promoting Objective Function for Neural Conversation Models
2015cited by this paper
Neural Responding Machine for Short-Text Conversation
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Incremental on-line adaptation of POMDP-based dialogue managers to extended domains
2014influential reference
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Playing Atari with Deep Reinforcement Learning
2013influential reference
POMDP-Based Statistical Spoken Dialog Systems: A Review
2013influential reference
On-line policy optimisation of Bayesian spoken dialogue systems via human interaction
2013cited by this paper
POMDP-based dialogue manager adaptation to extended domains
2013cited by this paper
IRIS: a Chat-oriented Dialogue System based on the Vector Space Model
2012cited by this paper
Developing Non-goal Dialog System Based on Examples of Drama Television
2012cited by this paper
Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System
2011cited by this paper
Learning to Win by Reading Manuals in a Monte-Carlo Framework
2011cited by this paper
Data-Driven Response Generation in Social Media
2011cited by this paper
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management
2010cited by this paper
Learning to Follow Navigational Directions
2010cited by this paper
Are We There Yet? Research in Commercial Spoken Dialog Systems
2009cited by this paper
Curriculum learning
2009cited by this paper
Submission Category : Applications , Preference : ORAL Reinforcement Learning for Spoken Dialogue Systems
2007cited by this paper
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
2004cited by this paper
A trainable generator for recommendations in multimodal dialog
2003cited by this paper
Stochastic Optimization
2003cited by this paper
Trainable approaches to surface natural language generation and their application to conversational dialog systems
2002cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Stochastic Language Generation for Spoken Dialogue Systems
2000cited by this paper
An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email
2000cited by this paper
A stochastic model of human-machine interaction for learning dialog strategies
2000cited by this paper
Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System
2000cited by this paper
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999cited by this paper
Learning dialogue strategies within the Markov decision process framework
1997cited by this paper
On the Semantics and Pragmatics of Linguistic Feedback
1992cited by this paper
Likelihood ratio gradient estimation for stochastic systems
1990cited by this paper
Opening up Closings
1973cited by this paper
DOI: 10.1017/S000000000000000 Printed in the United Kingdom A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies
year unknowncited by this paper

CITED BY

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue
2026cites this paper
GAN-AIIPot: GAN-Based Cyber Deception for Probing Attacks on IoT Devices
2026cites this paper
A Correlation Aware Multimodal Fusion Network With Emotional Transference Capture for Emotion Recognition in Conversations
2026cites this paper
Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards
2026cites this paper
Reinforcement Learning-Driven Adaptive Emotion-Cause Analysis in Conversation
2026cites this paper
SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue
2026cites this paper
StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation
2026cites this paper
ECO Decoding: Entropy-Based Control for Controllability and Fluency in Controllable Dialogue Generation
2025cites this paper
Multi-Label Classification and Large Language Model Fine-Tuning for Dialogue Analysis and Guidance Generation
2025cites this paper
A Hybrid Pipeline and Large Language Model System for Task-Oriented Dialogue
2025cites this paper
Agentic RAG-Based Legal Advisory Chatbot: A Knowledge-Driven Approach for Vietnamese Legal System
2025cites this paper
Exploring and Validating Key Components Enabling Context-Based Dynamism in Chatbot Architectures
2025cites this paper
Layer-Informed Memorability Prediction and Reinforcement-Guided Multimedia Content Adjustment
2025cites this paper
Explainable artificial intelligence in the talent recruitment process-a literature review
2025cites this paper
Towards better dense rewards in Reinforcement Learning Applications
2025cites this paper
Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
2025cites this paper
Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO
2025cites this paper
Bridging Classic and Modern English: An NLP Approach to Translation and Educational Chatbots in English Literature
2025cites this paper
Hybrid Learning Module-Based Transformer for Multitrack Music Generation With Music Theory
2025cites this paper
RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving
2025cites this paper
Conversations From Make-Believe: An Attentive Encoder–Decoder Chatbot Trained on Scripted Dialogue
2025cites this paper
Online Robust Reinforcement Learning with General Function Approximation
2025cites this paper
SCAR: Shapley Credit Assignment for More Efficient RLHF
2025cites this paper
Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation
2025cites this paper
Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms
2025cites this paper
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
2025cites this paper
A Cross-Modal Fusion Network with Reinforcement Learning for Emotion Recognition in Conversations
2025cites this paper
Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization
2025cites this paper
Reviewing chatbot algorithms: methods for intelligent dialogue systems
2025cites this paper
TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation
2025cites this paper
Reinforcement-learning-based proactive medical dialogue system for health status and medical image collection
2025cites this paper
SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models
2025cites this paper
train_fmu_gym: A Functional Mock-up Unit-Based Framework to Train Reinforcement Learning Agents for Multi-Physical Systems
2025cites this paper
SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning
2025cites this paper
MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models
2025cites this paper
Green Reinforcement Learning for Adaptive Mental Health Support Systems using multimodal recognition
2025cites this paper
Generation, Analysis and Experimental Validation of an Emotion-Enlightened Synthetic Dialogue-Dataset via Advanced LLM-Based Methodologies
2025cites this paper
Personality Dialogue Agent Based on Personality Description and Conversation History
2025cites this paper
Knowing Ourselves Through Others: Reflecting with AI in Digital Human Debates
2025cites this paper
On The Statistical Complexity of Offline Decision-Making
2025cites this paper
TrBot: A Turkish Deep Learning Chatbot Utilizing Seq2Seq Model
2025cites this paper
NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence
2025cites this paper
Enhancing Chatbot Performance in a SaaS Platform Through Retrieval-Augmented Generation and Prompt Engineering: A Case Study in Behavioral Safety Analysis
2025cites this paper
Reinforcement learning-based LLM dialogue active learning strategy
2025cites this paper
Research on Hybrid Data Classification and Table Structure Optimization Methods for Training Large Models
2025cites this paper
Explainability in Practice: A Survey of Explainable NLP Across Various Domains
2025cites this paper
PLATO-JDS: Enhancing Japanese Dialogue Systems Through Topic-Switch Adaptation
2025cites this paper
Towards Teams being Led by a Conversational Agent
2025cites this paper
Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
2025cites this paper
A Unified Supervised and Unsupervised Dialogue Topic Segmentation Framework Based on Utterance Pair Modeling
2025cites this paper
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
2025cites this paper
Decoding the cry for help: AI's emerging role in suicide risk assessment
2025cites this paper
SoK: Machine Unlearning for Large Language Models
2025cites this paper
History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM
2025cites this paper
Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
2025cites this paper
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Grounding Gap via Informational Interviews
2025cites this paper
Optimizing Conversational Product Recommendation via Reinforcement Learning
2025cites this paper
Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
2025cites this paper
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
2025cites this paper
HumAIne-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning
2025cites this paper
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
2025cites this paper
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
2025cites this paper
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations
2024cites this paper
Evidence Reasoning and Curriculum Learning for Document-Level Relation Extraction
2024cites this paper
LDQN: A Lightweight Deep Reinforcement Learning Model
2024cites this paper
A Survey on Neural Data-to-Text Generation
2024cites this paper
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
2024cites this paper
A Study on Information Search Behavior Using AI-Powered Engines: Evidence From Chatbots on Online Shopping Platforms
2024cites this paper
LEXIQL: Quantum Natural Language Processing on NISQ-era Machines
2024cites this paper
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews
2024cites this paper
Enhancing Reinforcement Learning with Dense Rewards from Language Model Critic
2024cites this paper
Interpretable and efficient data-driven discovery and control of distributed systems
2024cites this paper
TRIP NEGOTIATOR: A Travel Persona-aware Reinforced Dialogue Generation Model for Personalized Integrative Negotiation in Tourism
2024cites this paper
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
2024cites this paper
A multi-agent collaborative algorithm for task-oriented dialogue systems
2024cites this paper
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
2024cites this paper
Response Generation with Personal Attributes and Act Information
2024cites this paper
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
2024cites this paper
Stochastic Optimization Methods for Policy Evaluation in Reinforcement Learning
2024cites this paper
Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
2024cites this paper
Learning Autonomous Navigation in Unmapped and Unknown Environments
2024cites this paper
Applying Reinforcement Learning and Multi-Generators for Stage Transition in an Emotional Support Dialogue System
2024cites this paper
Second Order Bounds for Contextual Bandits with Function Approximation
2024cites this paper
Mitigating the negative impact of over-association for conversational query production
2024cites this paper
Deep Learning for Intelligent Customer Service Automation: Development of GRU, LSTM, and Recurrent Neural Network Architectures for Chatbot Applications
2024cites this paper
An Emotional Dialogue System Using Conditional Generative Adversarial Networks with a Sequence-to-Sequence Transformer Encoder
2024cites this paper
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
2024cites this paper
Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in Chatbots
2024cites this paper
Sequence labeling via reinforcement learning with aggregate labels
2024cites this paper
Application of Natural Language Processing in Virtual Experience AI Interaction Design
2024cites this paper
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback
2024cites this paper
Reinforcement Learning for Language Grounding: Mapping Words to Actions in Human-Robot Interaction
2024cites this paper
Creating, Using and Assessing a Generative-AI-Based Human-Chatbot-Dialogue Dataset with User-Interaction Learning Capabilities
2024cites this paper
ICL: An Incentivized Collaborative Learning Framework
2024cites this paper
Prompt-Based Length Controlled Generation with Multiple Control Types
2024cites this paper
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
2024cites this paper
SimPO: Simple Preference Optimization with a Reference-Free Reward
2024cites this paper
EnronSR: A Benchmark for Evaluating AI-Generated Email Replies
2024cites this paper
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
2024cites this paper
Communicating Unnamable Risks: Aligning Open World Situation Models Using Strategies from Creative Writing
2024cites this paper