This paper describes an experiment in synthesizing four emotional states - anger, happiness, sadness and neutral – using a concatenative speech synthesizer. To achieve this, five emotionally (i.e., semantically) unbiased target sentences were prepared. Then, separate speech inventories, comprising the target diphones for each of the above emotions, were recorded. Using the 16 different combinations of prosody and inventory during synthesis resulted in 80 synthetic test sentences. The results were evaluated by conducting listening tests with 33 naive listeners. Synthesized anger was recognized with 86.1% accuracy, sadness with 89.1%, happiness with 44.2%, and neutral emotion with 81.8% accuracy. According to our results, anger was classified as inventory dominant and sadness and neutral as prosody dominant. Results were not sufficient to make similar conclusions regarding happiness. The highest recognition accuracies were achieved for sentences synthesized by using prosody and diphone inventory belonging to the same emotion.
Expressive speech synthesis using a concatenative synthesizer
M. Bulut,Shrikanth S. Narayanan,A. Syrdal
Published 2002 in Interspeech
ABSTRACT
PUBLICATION RECORD
- Publication year
2002
- Venue
Interspeech
- Publication date
2002-09-16
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-11 of 11 references · Page 1 of 1