This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that "there is no data like more data", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.
Data selection for speech recognition
Yi Wu,Rong Zhang,Alexander I. Rudnicky
Published 2007 in Automatic Speech Recognition & Understanding
ABSTRACT
PUBLICATION RECORD
- Publication year
2007
- Venue
Automatic Speech Recognition & Understanding
- Publication date
2007-12-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-16 of 16 references · Page 1 of 1
CITED BY
Showing 1-79 of 79 citing papers · Page 1 of 1