A Study on How Human Annotations Benefit the TTS Voice

Min Chu,Yining Chen,Yong Zhao,Yusheng Li,F. Soong

Published 2006 in Blizzard Challenge

ABSTRACT

When we built the unit inventory from the Blizzard corpus, three types of manual works were performed. All these works took about 12 working days of our labelers. In order to see how much benefit these manual works bring us, we performed several perceptual experiments to compare the speech generated with/without manual works. The results show that although the manual proofreading identified more than 500 word-errors, no improvement is observed in our experiment. Both manual checking of segmental boundaries and manual prosody annotations make the synthesized speech better. And the later one brings more benefit. The preference rate between the final version of the synthetic speech with limited manual works and the fully automatically processed version is 68% to 32%.

PUBLICATION RECORD

Publication year
2006
Venue
Blizzard Challenge
Publication date
2006-09-16
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.21437/blizzard.2006-10
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units
2006cited by this paper
Modeling stylized invariance and local variability of prosody in text-to-speech synthesis
2006cited by this paper
Automatic accent annotation with limited manually labeled data
2006cited by this paper
Towards phone segmentation for concatenative speech synthesis
2004cited by this paper
Evaluating and correcting phoneme segmentation for unit selection synthesis
2003cited by this paper
Microsoft Mulan - a bilingual TTS system
2003cited by this paper
Automatic phonetic segmentation
2003cited by this paper
Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis
2000cited by this paper
Inter-transcriber reliability of toBI prosodic labeling
2000cited by this paper
Automatic ToBI prediction and alignment to speed manual labeling of prosody
1999cited by this paper

CITED BY

Emilia: a speech corpus for Argentine Spanish text to speech synthesis
2019cites this paper
On the impact of phoneme alignment in DNN-based speech synthesis
2016cites this paper
Multitier Annotation of Urdu Speech Corpus
2014cites this paper
Measuring a decade of progress in Text-to-Speech
2014cites this paper