In this paper we investigate the novel use of exclusively audio to predict whether a spoken dialogue will be successful or not, both in a subjective and in an objective manner. To achieve that, multiple spectral and rhythmic features are inputted to support vector machines and deep neural networks. We report results on data from 3267 spoken dialogues, using both the full user response as well as parts of it. Experiments show an average accuracy of 74% can be achieved using just 5 acoustic features, when analysing merely 1 user turn, which allows both a real-time but also a fairly accurate prediction of a dialogue successfulness only after one short interaction unit. From the features tested, those related to speech rate, signal energy and cepstrum are amongst the most informative. Results presented here outperform the state of the art in spoken dialogue success prediction through solely acoustic features.
Prediction of Dialogue Success with Spectral and Rhythm Acoustic Features Using DNNS and SVMS
Athanasios Lykartsis,M. Kotti,A. Papangelis,Y. Stylianou
Published 2018 in Spoken Language Technology Workshop
ABSTRACT
PUBLICATION RECORD
- Publication year
2018
- Venue
Spoken Language Technology Workshop
- Publication date
2018-12-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-41 of 41 references · Page 1 of 1
CITED BY
Showing 1-5 of 5 citing papers · Page 1 of 1