During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets and how they can be used to learn diverse dialogue strategies. We also describe other potential uses of these datasets, such as methods for transfer learning between datasets and the use of external knowledge, and discuss appropriate choice of evaluation metrics for the learning objective.
A Survey of Available Corpora for Building Data-Driven Dialogue Systems
Iulian Serban,Ryan Lowe,Peter Henderson,Laurent Charlin,Joelle Pineau
Published 2015 in Dialogue and Discourse
ABSTRACT
PUBLICATION RECORD
- Publication year
2015
- Venue
Dialogue and Discourse
- Publication date
2015-12-17
- Fields of study
Mathematics, Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.