Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM

Toan Pham Van,Hau Nguyen Thanh,Ta Minh Thanh

Published 2018 in National Foundation for Science and Technology Development Conference on Information and Computer Science

ABSTRACT

Phonetic recognition is one of the most challenging problems in the field of speech analysis. These applications can be mentioned such as dialect identification [1], mispronunciation detection [2], spoken document retrieval [3], and so on. There are different approaches to solve these problems such as improving the feature selection on input speech [4], applying deep learning technique [5] [6] [7] or combining both of them [8]. With the sequence data as the phonetics, the architecture which is based on recurrent neural network (RNN) is an appropriate approach [9]. It is even more powerful when combined with the improvement of features selection on input data. In our approach, we combine the Mel Frequency Cepstral Coefficients (MFCC) method with sequence-length to present the acoustic features of speech and use some RNN models to phonetic classification. Our experiments are implemented on the Texas Instruments Massachusetts Institute of Technology (TIMIT) [10] phone recognition dataset. Especially, our data processing and features selection method give consistently better results than other researches using the same neural network model. Currently, we have achieved the lowest error test rate (13.05%) by using Bidirectional LSTM, which is the best result in TIMIT dataset with the reduction of about 3.5% over the last best result [5] [6].

PUBLICATION RECORD

Publication year
2018
Venue
National Foundation for Science and Technology Development Conference on Information and Computer Science
Publication date
2018-11-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.1109/NICS.2018.8606886
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning Filterbanks from Raw Speech for Phone Recognition
2017cited by this paper
A Regularization Post Layer: An Additional Way How to Make Deep Neural Networks Robust
2017cited by this paper
Segmental Recurrent Neural Networks for End-to-End Speech Recognition
2016cited by this paper
Weakly Supervised Memory Networks
2015cited by this paper
Attention-Based Models for Speech Recognition
2015cited by this paper
Phone recognition with hierarchical convolutional deep maxout networks
2015cited by this paper
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Speech recognition with deep recurrent neural networks
2013cited by this paper
Sequence Transduction with Recurrent Neural Networks
2012cited by this paper
Phoneme Recognition on the TIMIT Database
2011cited by this paper
Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training
2009cited by this paper
Deep Belief Networks for phone recognition
2009cited by this paper
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
2006cited by this paper
Phonetic recognition for spoken document retrieval
1998cited by this paper
Transcription and Alignment of the TIMIT Database
1996cited by this paper
Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech
1996cited by this paper
DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1
1993influential reference
Speaker-independent phone recognition using hidden Markov models
1989cited by this paper

CITED BY

NS-SVM: Bolstering Chicken Egg Harvesting Prediction with Normalization and Standardization
2023cites this paper