Phonetic recognition is one of the most challenging problems in the field of speech analysis. These applications can be mentioned such as dialect identification [1], mispronunciation detection [2], spoken document retrieval [3], and so on. There are different approaches to solve these problems such as improving the feature selection on input speech [4], applying deep learning technique [5] [6] [7] or combining both of them [8]. With the sequence data as the phonetics, the architecture which is based on recurrent neural network (RNN) is an appropriate approach [9]. It is even more powerful when combined with the improvement of features selection on input data. In our approach, we combine the Mel Frequency Cepstral Coefficients (MFCC) method with sequence-length to present the acoustic features of speech and use some RNN models to phonetic classification. Our experiments are implemented on the Texas Instruments Massachusetts Institute of Technology (TIMIT) [10] phone recognition dataset. Especially, our data processing and features selection method give consistently better results than other researches using the same neural network model. Currently, we have achieved the lowest error test rate (13.05%) by using Bidirectional LSTM, which is the best result in TIMIT dataset with the reduction of about 3.5% over the last best result [5] [6].
Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM
Toan Pham Van,Hau Nguyen Thanh,Ta Minh Thanh
Published 2018 in National Foundation for Science and Technology Development Conference on Information and Computer Science
ABSTRACT
PUBLICATION RECORD
- Publication year
2018
- Venue
National Foundation for Science and Technology Development Conference on Information and Computer Science
- Publication date
2018-11-01
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-19 of 19 references · Page 1 of 1
CITED BY
Showing 1-1 of 1 citing papers · Page 1 of 1