Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition

Yi Liu,Yunqing Xia,Xuan Wang,Chin-Hui Lee

Published 2013 in IEEE Transactions on Audio, Speech, and Language Processing

ABSTRACT

In this paper, we propose a discriminative dynamic Gaussian mixture selection (DGMS) strategy to generate reliable accent-specific units (ASUs) for multi-accent speech recognition. Time-aligned phone recognition is used to generate the ASUs that model accent variations explicitly and accurately. DGMS reconstructs and adjusts a pre-trained set of hidden Markov model (HMM) state densities to build dynamic observation densities for each input speech frame. A discriminative minimum classification error criterion is adopted to optimize the sizes of the HMM state observation densities with a genetic algorithm (GA). To the author's knowledge, the discriminative optimization for DGMS accomplishes discriminative training of discrete variables that is first proposed. We found the proposed framework is able to cover more multi-accent changes, thus reduce some performance loss in pruned beam search, without increasing the model size of the original acoustic model set. Evaluation on three typical Chinese accents, Chuan, Yue and Wu, shows that our approach outperforms traditional acoustic model reconstruction techniques with a syllable error rate reduction of 8.0%, 5.5% and 5.0%, respectively, while maintaining a good performance on standard Putonghua speech.

PUBLICATION RECORD

Publication year
2013
Venue
IEEE Transactions on Audio, Speech, and Language Processing
Publication date
2013-10-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/TASL.2013.2265087
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection
2012cited by this paper
Using stacked transformations for recognizing foreign accented speech
2011cited by this paper
Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese
2011cited by this paper
Minimum Error Classification with geometric margin control
2010cited by this paper
Speech recognition with speech synthesis models by marginalising over decision tree leaves
2009cited by this paper
Discriminative n-gram selection for dialect recognition
2009cited by this paper
MLLR/MAP adaptation using pronunciation variation for non-native speech recognition
2009cited by this paper
State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
2009cited by this paper
Dialect Classification for Online Podcasts Fusing Acoustic and Language Based Structural and Semantic Information
2008cited by this paper
Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition
2008cited by this paper
Unsupervised Discriminative Training With Application to Dialect Classification
2007cited by this paper
Advances in phone-based modeling for automatic accent classification
2006cited by this paper
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
2006cited by this paper
Multi-accent Chinese speech recognition
2006influential reference
Accent detection and speech recognition for Shanghai-accented Mandarin
2005cited by this paper
A framework for predicting speech recognition errors
2005cited by this paper
Effects and modeling of phonetic and acoustic confusions in accented speech.
2005influential reference
Unsupervised online adaptation for speaker verification over the telephone
2004cited by this paper
Minimum classification error training of landmark models for real-time continuous speech recognition
2004cited by this paper
Pronunciation change in conversational speech and its implications for automatic speech recognition
2004cited by this paper
Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
2004cited by this paper
State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition
2004cited by this paper
Analysis of acoustic correlates of British, Australian and American accents
2003cited by this paper
Pronunciation variation analysis based on acoustic and phonemic distance measures with application examples on Mandarin Chinese
2003cited by this paper
A syllable, articulatory-feature, and stress-accent model of speech recognition
2002cited by this paper
Improved context-dependent acoustic modeling for continuous Chinese speech recognition
2001cited by this paper
Pronunciation Modeling of Mandarin Casual Speech
2000cited by this paper
Dynamic HMM selection for continuous speech recognition
1999cited by this paper
Dynamic pronunciation models for automatic speech recognition
1999influential reference
Pronunciation modeling by sharing gaussian densities across phonetic models
1999cited by this paper
Speaker-independent upfront dialect adaptation in a large vocabulary continuous speech recognizer
1998cited by this paper
Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method
1998influential reference
Minimum classification error rate methods for speech recognition
1997influential reference
Automatic accent classification of foreign accented Australian English speech
1996cited by this paper
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
1995cited by this paper
Foreign accent classification using source generator based prosodic features
1995cited by this paper
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
1994cited by this paper
A course in phonetics
1975cited by this paper

CITED BY

The Influence of Input Method and Chinese Character Complexity on the Elderly Using Smartphones to Input Information – A Study from China
2024cites this paper
Automatic Voice Query Service for Multi-Accented Mandarin Speech
2021cites this paper
Consumption behavior of eco-friendly products and applications of ICT innovation
2020cites this paper
A discrete hidden Markov model fault diagnosis strategy based on K-means clustering dedicated to PEM fuel cell systems of tramways
2018cites this paper
Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features
2016cites this paper
CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
2016cites this paper
Dynamic Pronunciation Modelling for Unsupervised Learning of ASR Systems
2016cites this paper
Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
2015cites this paper
An Algorithm Study for Speech Emotion Recognition Based Speech Feature Analysis
2015cites this paper
Unsupervised adaptation of ASR systems: An application of dynamic programming in machine learning
2015cites this paper
Discriminative dynamic Gaussian mixture selection with enhanced robustness and performance for multi-accent speech recognition
2012cites this paper