Recipes for Building an Open-Domain Chatbot

Stephen Roller,Emily Dinan,Naman Goyal,Da Ju,Mary Williamson,Yinhan Liu,Jing Xu,Myle Ott,Kurt Shuster,Eric Michael Smith,Y-Lan Boureau,J. Weston

Published 2020 in Conference of the European Chapter of the Association for Computational Linguistics

ABSTRACT

Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we highlight other ingredients. Good conversation requires blended skills: providing engaging talking points, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available. Human evaluations show our best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.

PUBLICATION RECORD

Publication year
2020
Venue
Conference of the European Chapter of the Association for Computational Linguistics
Publication date
2020-04-28
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/2021.eacl-main.24 arXiv 2004.13637
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

CONCEPTS

empathy
skill

A conversational skill the chatbot models are trained to display appropriately in dialogue.

AK (4715169a40) extractionAnonymous (12632b8b5f) review
engagingness
metric

A human-evaluated measure of how compelling the chatbot feels in conversation.

AK (4715169a40) extractionAnonymous (12632b8b5f) review
generation strategy
method

The decoding choice that helps determine conversation quality in the chatbot recipes studied here.

AK (4715169a40) extractionAnonymous (12632b8b5f) review
humanness
metric

A human evaluation metric used to judge how human-like chatbot responses appear.

AK (4715169a40) extractionAnonymous (12632b8b5f) review
multi-turn dialogue
task

An open-domain conversation task used to evaluate chatbot engagingness and humanness.

AK (4715169a40) extractionAnonymous (12632b8b5f) review
neural language model
method

A large neural model used to learn open-domain conversational behavior from training data.

Aliases: neural models

AK (4715169a40) extractionAnonymous (12632b8b5f) review
open-domain chatbot
system

A dialogue system designed to hold broad multi-turn conversations across unrestricted topics.

Aliases: open-domain chatbots

AK (4715169a40) extractionAnonymous (12632b8b5f) review
persona consistency
property

The chatbot's ability to maintain a stable persona across turns in conversation.

Aliases: consistent persona

AK (4715169a40) extractionAnonymous (12632b8b5f) review

REFERENCES

Reformer: The Efficient Transformer
2020cited by this paper
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
2020cited by this paper
Experience Grounds Language
2020cited by this paper
Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills
2020influential reference
Longformer: The Long-Document Transformer
2020cited by this paper
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
2020cited by this paper
Towards a Human-like Open-Domain Chatbot
2020influential reference
The Pushshift Reddit Dataset
2020cited by this paper
Scaling Laws for Neural Language Models
2020cited by this paper
MAKING KNIGHTS SMILE IN A FANTASY GAME WORLD
2019cited by this paper
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
2019cited by this paper
Learning from Dialogue after Deployment: Feed Yourself, Chatbot!
2019cited by this paper
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents
2019cited by this paper
The Second Conversational Intelligence Challenge (ConvAI2)
2019cited by this paper
Learning to Speak and Act in a Fantasy Text Adventure Game
2019cited by this paper
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
2019influential reference
The Curious Case of Neural Text Degeneration
2019influential reference
Language Models are Unsupervised Multitask Learners
2019cited by this paper
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
Neural Text Generation with Unlikelihood Training
2019influential reference
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
2019influential reference
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
2019influential reference
CTRL: A Conditional Transformer Language Model for Controllable Generation
2019cited by this paper
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019cited by this paper
Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models
2019cited by this paper
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
2019cited by this paper
How Decoding Strategies Affect the Verifiability of Generated Text
2019cited by this paper
Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation
2019influential reference
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
2019influential reference
Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training
2019influential reference
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
2019cited by this paper
Compressive Transformers for Long-Range Sequence Modelling
2019cited by this paper
Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
2019influential reference
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
A Goal-oriented Neural Conversation Model by Self-Play
2018cited by this paper
Training Millions of Personalized Dialogue Agents
2018cited by this paper
Wizard of Wikipedia: Knowledge-Powered Conversational agents
2018cited by this paper
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
2018cited by this paper
The Design and Implementation of XiaoIce, an Empathetic Social Chatbot
2018cited by this paper
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
2018cited by this paper
From Eliza to XiaoIce: challenges and opportunities with social chatbots
2018cited by this paper
Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
2018influential reference
Learning Semantic Textual Similarity from Conversations
2018cited by this paper
Personalizing Dialogue Agents: I have a dog, do you have pets too?
2018influential reference
Learning to Write with Cooperative Discriminators
2018influential reference
Building a Conversational Agent Overnight with Dialogue Self-Play
2018cited by this paper
Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning
2018cited by this paper
Hierarchical Neural Story Generation
2018cited by this paper
Improving Language Understanding by Generative Pre-Training
2018cited by this paper
Retrieve and Refine: Improved Sequence Generation Models For Dialogue
2018influential reference
ParlAI: A Dialog Research Software Platform
2017influential reference
Sequence Effects in Crowdsourced Annotations
2017cited by this paper
Learning Robust Dialog Policies in Noisy Environments
2017cited by this paper
Attention is All you Need
2017influential reference
A Deep Reinforced Model for Abstractive Summarization
2017cited by this paper
Mixed Precision Training
2017cited by this paper
A Persona-Based Neural Conversation Model
2016cited by this paper
Dialog-based Language Learning
2016cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper

CITED BY

FACE: Fully Overlapped PD Scheduling and Multi-Level Architecture Co-Exploration on Wafer
2026cites this paper
Multi-level semantics-aware and multi-granularity knowledge-infused model for emotional support conversation
2026cites this paper
CEREAL: personality-driven LLM-based conversational recommendation dataset with contextually-enriched and realistic user interactions
2026cites this paper
DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
2026cites this paper
CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production
2026cites this paper
Personalized Response Generation in Large Language Models via Lightweight Preference Optimization and Dynamic Context Integration
2026cites this paper
ESAinsTOD: a unified end-to-end schema-aware instruction-tuning framework for task-oriented dialog modeling
2026cites this paper
PDPA: A prompt-based dual persona-aware approach for empathetic response generation
2026cites this paper
Anytime Safe PAC Efficient Reasoning
2026cites this paper
Causal-ESC : Capture the Dynamics in Cause-and-Effect Detection for Emotional Support Conversation
2026cites this paper
A review of instruction-guided image editing
2026cites this paper
Query-efficient and dataset-independent red teaming for LLMs content safety evaluation
2025cites this paper
A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving
2025cites this paper
Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models
2025cites this paper
DLU: Dictionary Look-Up Data and Prediction
2025cites this paper
TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data
2025cites this paper
Investigating Multimodal Empathetic Conversational AI Utilizing Ensemble Learning and Humor
2025cites this paper
Generative Artificial Intelligence and Large Language Models: A Systematic Review of Architectures, Applications, and Future Directions
2025cites this paper
DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset
2025cites this paper
Implementation of a Virtual Dental Assistant based on GPT: Part 2
2025cites this paper
Large Language Models for Computer-Aided Design: A Survey
2025cites this paper
IntentionESC: An Intention-Centered Framework for Enhancing Emotional Support in Dialogue Systems
2025cites this paper
INVESTIGATING THE FACTORS INFLUENCING ADOPTION INTENTIONS OF CHATGPT FOR SPORT EVENTS
2025cites this paper
Generating, retrieving persona and generating responses for long-term open-domain dialogue
2025cites this paper
MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs
2025cites this paper
Scaling Personality Control in LLMs with Big Five Scaler Prompts
2025cites this paper
Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context
2025cites this paper
REFRAG: Rethinking RAG based Decoding
2025cites this paper
Semantic Fusion with Fuzzy-Membership Features for Controllable Language Modelling
2025cites this paper
A Comparative Benchmark Analysis of Recreation Programs in Türkiye, the United States, and Canada Using a ChatGPT-Based Interview Approach
2025cites this paper
Factors influencing subjective opinion attribution to conversational robots
2025cites this paper
Must Read: A Systematic Survey of Computational Persuasion
2025cites this paper
When large language models are reliable for judging empathic communication
2025cites this paper
Between reality and delusion: challenges of applying large language models to companion robots for open-domain dialogues with older adults
2025cites this paper
An Interactive Evaluation Framework for Empathetic Response Generation
2025cites this paper
Hallucinations in LLMs and Resolving Them: A Holistic Approach
2025cites this paper
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
2025cites this paper
Generative AI for Named Entity Recognition in Low-Resource Language Nepali
2025cites this paper
A vision of human–AI collaboration for enhanced biological collection curation and research
2025cites this paper
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
2025cites this paper
Personality Dialogue Agent Based on Personality Description and Conversation History
2025cites this paper
Leveraging Learner Errors in Digital Argumentation Learning: How ALure Helps Students Learn from their Mistakes and Write Better Arguments
2025cites this paper
PreCare: Designing AI Assistants for Advance Care Planning (ACP) to Enhance Personal Value Exploration, Patient Knowledge, and Decisional Confidence
2025cites this paper
Automated Feedback Loops to Protect Text Simplification with Generative AI from Information Loss
2025cites this paper
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
2025cites this paper
Transformers in speech processing: Overcoming challenges and paving the future
2025cites this paper
Re-Initialization Token Learning for Tool-Augmented Large Language Models
2025cites this paper
Vega: LLM-Driven Intelligent Chatbot Platform for Internet of Things Control and Development
2025cites this paper
Black-Box Adversarial Attack on Dialogue Generation via Multi-Objective Optimization
2025cites this paper
The Hidden Threat in Plain Text: Attacking RAG Data Loaders
2025cites this paper
Chat-Ghosting: A Comparative Study of Methods for Auto-Completion in Dialog Systems
2025cites this paper
Identifying Algorithmic and Domain-Specific Bias in Parliamentary Debate Summarisation
2025cites this paper
Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs
2025cites this paper
Can LLMs Generate High-Quality Task-Specific Conversations?
2025cites this paper
A Survey on Integration of Empathy Cues Into Empathetic Text Generation
2025cites this paper
Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
2025cites this paper
Unraveling the cognitive patterns of Large Language Models through module communities
2025cites this paper
Ensemble Large Language Models: A Survey
2025cites this paper
Chat with one voice: mitigating semantic disparity across languages for multilingual open-domain dialogue response generation systems
2025cites this paper
Reviewing chatbot algorithms: methods for intelligent dialogue systems
2025cites this paper
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
2025cites this paper
Evaluating the Effectiveness of Advanced Language Models in Detecting and Mitigating Hallucinations Using Structured Question- Answering, Novel Metrics, and Post-Hoc Retrieval
2025cites this paper
Enhancing physical activity through a relational artificial intelligence chatbot: A feasibility and usability study
2025cites this paper
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
2025influential citation
Proactive Conversational AI: A Comprehensive Survey of Advancements and Opportunities
2025cites this paper
A Weighted Composite Metric for Evaluating User Experience in Educational Chatbots: Balancing Usability, Engagement, and Effectiveness
2025cites this paper
Integrating Visual Modalities with Large Language Models for Mental Health Support
2025cites this paper
Argumentative review aggregation and dialogical explanations
2025cites this paper
ECC: Synergizing Emotion, Cause and Commonsense for Empathetic Dialogue Generation
2025cites this paper
The rise and potential of large language model based agents: a survey
2025cites this paper
Interactive Conversational AI with IoT Devices for Enhanced Human-Robot Interaction
2025cites this paper
Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation
2025cites this paper
Prompt Learning With Multiperspective Cues for Emotional Support Conversation Systems
2025influential citation
A Holistic Comparative Study of Large Language Models as Emotional Support Dialogue Systems
2025cites this paper
Utilizing Large Language Models with Causal Reasoning and Commonsense Knowledge for Empathic Dialogue Generation
2025cites this paper
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
2025cites this paper
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
2025cites this paper
Conversational AI for Enhancing English Speaking Proficiency: A Mobile App
2025cites this paper
AI-Powered Chatbots in Organizations: A Systematic Literature Review
2025cites this paper
DialFill: Utilizing Dialogue Filling to Integrate Retrieved Knowledge in Responses
2025cites this paper
Performance Aware LLM Load Balancer for Mixed Workloads
2025cites this paper
ExpertRAG: Efficient RAG with Mixture of Experts - Optimizing Context Retrieval for Adaptive LLM Responses
2025cites this paper
Investigation of the conditions for continuous information conveyance by two autonomous conversational agents
2025cites this paper
Using ontologies to contextualize queries to large language models
2025cites this paper
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage
2025cites this paper
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages
2025cites this paper
RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models
2025cites this paper
Holistic approach for selecting chatbot development tools: combining AHP and TOPSIS methodologies
2025cites this paper
Dialogue Language Model with Large-Scale Persona Data Engineering
2025cites this paper
Emotional Supporters often Use Multiple Strategies in a Single Turn
2025influential citation
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
2025cites this paper
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
2025cites this paper
Enhancing logical reasoning in language models: An investigation of the Capybara dataset
2025cites this paper
Echoes of Bias: An Analysis of ChatGPT in Financial Planner–Client Dialogs
2025cites this paper
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
2025cites this paper
From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents
2025cites this paper
AgentTOD: A Task-Oriented Dialogue Agent with a Flexible and Adaptive API Calling Paradigm
2025cites this paper
Designing and Developing Intelligent Chatbots with Natural Language Processing Through a Conversational AI Approach
2025cites this paper
Transformer model with external token memories and attention for PersonaChat
2025cites this paper
KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy
2025cites this paper