Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection

S. Andersson,Kallirroi Georgila,D. Traum,M. Aylett,R. Clark

Published 2010 in Proceedings of the International Conference on Speech Prosody

ABSTRACT

Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention and spontaneity associated with everyday conversations. Unit selection is heavily data dependent and thus in order to simulate human conversational speech, or create synthetic voices for believable virtual characters, we need to utilise speech data with examples of how people talk rather than how people read. In this paper we included carefully selected utterances from spontaneous conversational speech in a unit selection voice. Using this voice and by automatically predicting type and placement of lexical fillers and filled pauses we can synthesise utterances with conversational characteristics. A perceptual listening test showed that it is possible to make synthetic speech sound more conversational without degrading naturalness.

PUBLICATION RECORD

Publication year
2010
Venue
Proceedings of the International Conference on Speech Prosody
Publication date
2010-05-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.21437/speechprosody.2010-89
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Using Integer Linear Programming for Detecting Speech Disfluencies
2009cited by this paper
The Blizzard Challenge 2008
2008cited by this paper
The CSTR/Cereproc Blizzard Entry 2008: The Inconvenient Data
2008cited by this paper
A Virtual Human Dialogue Model for Non-Team Interaction
2008cited by this paper
On the role of acting skills for the collection of simulated emotional speech
2008cited by this paper
Paralinguistic elements in speech synthesis
2008influential reference
Spontaneous speech synthesis by pronunciation variant selection - a comparison to natural speech
2007cited by this paper
Statistical analysis of filled pauses2 rhythm for disfluent speech synthesis
2007cited by this paper
Towards conversational speech synthesis; lessons learned from the expressive speech processing project
2007cited by this paper
Filled Pauses in Speech Synthesis: Towards Conversational Speech
2007influential reference
Flexible Speech Translation Systems
2006cited by this paper
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies
2006cited by this paper
Disfluent speech analysis and synthesis: a preliminary approach
2006cited by this paper
Conversational speech synthesis and the need for some laughter
2005cited by this paper
AN EMPIRICAL TEXT TRANFORMATION METHOD FOR SPONTANEOUS SPEECH SYNTHESIZERS
2003influential reference
Spoken language synthesis: experiments in synthesis of spontaneous monologues
2002influential reference
The HTK book
1995cited by this paper

CITED BY

Natural Expression of a Machine Learning Model's Uncertainty Through Verbal and Non-Verbal Behavior of Intelligent Virtual Agents
2024cites this paper
Processing of prosodic cues of uncertainty in autistic and non-autistic adults: a study based on articulatory speech synthesis
2024cites this paper
Considerations for Child Speech Synthesis for Dialogue Systems
2023cites this paper
Evaluating Sampling-based Filler Insertion with Spontaneous TTS
2022cites this paper
How Does a Spontaneously Speaking Conversational Agent Affect User Behavior?
2022cites this paper
Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence
2022cites this paper
Building and Designing Expressive Speech Synthesis
2021cites this paper
1 Building and Designing Ex-pressive Speech Synthesis
2021cites this paper
Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis
2020cites this paper
Smooth Turn-taking by a Robot Using an Online Continuous Model to Generate Turn-taking Cues
2019cites this paper
On the Role of Disfluent Speech for Uncertainty in Articulatory Speech Synthesis
2019cites this paper
How to train your fillers: uh and um in spontaneous speech synthesis
2019cites this paper
Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept
2018cites this paper
Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
2018cites this paper
Generating Fillers Based on Dialog Act Pairs for Smooth Turn-Taking by Humanoid Robot
2018cites this paper
Spoken Dialogue System for a Human-like Conversational Robot ERICA
2018cites this paper
Speech synthesis systems: disadvantages and limitations
2018cites this paper
Dimensional paralinguistic information control based on multiple-regression HSMM for spontaneous dialogue speech synthesis with robust parameter estimation
2017cites this paper
Statistical parametric speech synthesis using conversational data and phenomena
2017cites this paper
Ajout automatique de disfluences pour la synthèse de la parole spontanée : formalisation et preuve de concept (Automatic disfluency insertion towards spontaneous TTS : formalization and proof of concept)
2017cites this paper
Design and Evaluation of Statistical Parametric Techniques in Expressive Text-To-Speech: Emotion and Speaking Styles Transplantation
2016cites this paper
Micro-structure of disfluencies: basics for conversational speech synthesis
2015cites this paper
Expressive speech synthesis : research and system design with hidden Markov models
2015cites this paper
Emotion transplantation through adaptation in HMM-based speech synthesis
2015cites this paper
Investigating automatic & human filled pause insertion for speech synthesis
2014cites this paper
Deliverable D3.5 Final Evaluation Report
2014cites this paper
Audiovisual prosody of uncertainty: An overview
2014cites this paper
Synthesis and evaluation of conversational characteristics in speech synthesis
2013cites this paper
On the Modelling of Prosodic Cues in Synthetic Speech: What Are the Effects on Perceived Uncertainty and Naturalness?
2013cites this paper
Disfluencies and uncertainty perception - evidence from a human - machine scenario
2013cites this paper
Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence
2012cites this paper
Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems
2012cites this paper
Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis
2012influential citation
Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users' Spontaneous Listener's Reactions
2011cites this paper
Automatic detection of unnatural word-level segments in unit-selection speech synthesis
2011cites this paper
Toward Construction of Spoken Dialogue System that Evokes Users’ Spontaneous Backchannels
2011cites this paper
Utilising spontaneous conversational speech in HMM-based speech synthesis
2010cites this paper
Towards Improving the Naturalness of Social Conversations with Dialogue Systems
2010cites this paper
Natural Language Processing and Cognitive Science
year unknowncites this paper
Edinburgh Research Explorer Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM
year unknowncites this paper