Prediction of Dialogue Success with Spectral and Rhythm Acoustic Features Using DNNS and SVMS

Athanasios Lykartsis,M. Kotti,A. Papangelis,Y. Stylianou

Published 2018 in Spoken Language Technology Workshop

ABSTRACT

In this paper we investigate the novel use of exclusively audio to predict whether a spoken dialogue will be successful or not, both in a subjective and in an objective manner. To achieve that, multiple spectral and rhythmic features are inputted to support vector machines and deep neural networks. We report results on data from 3267 spoken dialogues, using both the full user response as well as parts of it. Experiments show an average accuracy of 74% can be achieved using just 5 acoustic features, when analysing merely 1 user turn, which allows both a real-time but also a fairly accurate prediction of a dialogue successfulness only after one short interaction unit. From the features tested, those related to speech rate, signal energy and cepstrum are amongst the most informative. Results presented here outperform the state of the art in spoken dialogue success prediction through solely acoustic features.

PUBLICATION RECORD

Publication year
2018
Venue
Spoken Language Technology Workshop
Publication date
2018-12-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/SLT.2018.8639580
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Predicting dialogue success, naturalness, and length with acoustic features
2017influential reference
Will this dialogue be unsuccessful ? Prediction using audio features
2017cited by this paper
Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning
2017cited by this paper
Speaker Identification for Swiss German with Spectral and Rhythm Features
2017cited by this paper
Spoken Language Understanding for a Nutrition Dialogue System
2017cited by this paper
Edinburgh Research Explorer Recognising emotions in spoken dialogue with hierarchically fused acoustic and lexical features
2016cited by this paper
A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds
2016cited by this paper
Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems
2016cited by this paper
Socially-Aware Animated Intelligent Personal Assistant Agent
2016cited by this paper
Understanding User Satisfaction with Intelligent Assistants
2016cited by this paper
TensorFlow: A system for large-scale machine learning
2016cited by this paper
Using the beat histogram for speech rhythm description and language identification
2015influential reference
librosa: Audio and Music Signal Analysis in Python
2015cited by this paper
Towards automatic detection of reported speech in dialogue using prosodic cues
2015cited by this paper
Detecting repetitions in spoken dialogue systems using phonetic distances
2015cited by this paper
Automatic Online Evaluation of Intelligent Assistants
2015cited by this paper
Aplikace matematiky
2015cited by this paper
Multi-domain dialogue success classifiers for policy training
2015influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
Musical Onset Detection with Convolutional Neural Networks
2013cited by this paper
Improving Automatic Emotion Recognition from speech using Rhythm and Temporal feature
2013cited by this paper
Essentia: An Audio Analysis Library for Music Information Retrieval
2013cited by this paper
An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics
2012cited by this paper
Speaker idiosyncratic rhythmic features in the speech signal
2012cited by this paper
i-vector Based Speaker Recognition on Short Utterances
2011cited by this paper
Scikit-learn: Machine Learning in Python
2011cited by this paper
Design and evaluation of a smart home voice interface for the elderly – Acceptability and objection aspects
2011cited by this paper
An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech
2010cited by this paper
Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances
2010cited by this paper
Predicting the quality and usability of spoken dialogue services
2008cited by this paper
Predicting Success in Dialogue
2007cited by this paper
Music, Language, and the Brain
2007cited by this paper
Emotional speech recognition: Resources, features, and methods
2006cited by this paper
"yeah Right": Sarcasm Recognition for Spoken Dialogue Systems
2006cited by this paper
Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs
2006cited by this paper
Looking at the Last Two Turns, I'd Say This Dialogue Is Doomed - Measuring Dialogue Success
2004cited by this paper
An Introduction to Variable and Feature Selection
2003cited by this paper
Dialog act classification from prosodic features using support vector machines
2002cited by this paper
Music type classification by spectral contrast feature
2002cited by this paper
Toward Machine Emotional Intelligence: Analysis of Affective Physiological State
2001cited by this paper
Learning to Predict Problematic Situations in a Spoken Dialogue System: Experiments with How May I Help You?
2000cited by this paper

CITED BY

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks
2023cites this paper
Towards Continuous Estimation of Dissatisfaction in Spoken Dialog
2021cites this paper
Evaluation of the Informative Features of Cardiac Studies Diagnostic Data using the Kullback Method
2021cites this paper
FrownOnError: Interrupting Responses from Smart Speakers by Facial Expressions
2020cites this paper
Prediction of User Emotion and Dialogue Success Using Audio Spectrograms and Convolutional Neural Networks
2019influential citation