Phoneme Recognition on the TIMIT Database

Published 2011 in Unknown venue

ABSTRACT

In the information age, computer applications have become part of modern life and this has in turn encouraged the expectations of friendly interaction with them. Speech, as “the” communication mode, has seen the successful development of quite a number of applications using automatic speech recognition (ASR), including command and control, dictation, dialog systems for people with impairments, translation, etc. But the actual challenge goes beyond the use of speech in control applications or to access information. The goal is to use speech as an information source, competing, for example, with text online. Since the technology supporting computer applications is highly dependent on the performance of the ASR system, research into ASR is still an active topic, as is shown by the range of research directions suggested in (Baker et al., 2009a, 2009b). Automatic speech recognition – the recognition of the information embedded in a speech signal and its transcription in terms of a set of characters, (Junqua & Haton, 1996) – has been object of intensive research for more than four decades, achieving notable results. It is only to be expected that speech recognition advances make spoken language as convenient and accessible as online text when the recognizers reach error rates near zero. But while digit recognition has already reached a rate of 99.6%, (Li, 2008), the same cannot be said of phone recognition, for which the best rates are still under 80% 1,(Mohamed et al., 2011; Siniscalchi et al., 2007). Speech recognition based on phones is very attractive since it is inherently free from vocabulary limitations. Large Vocabulary ASR (LVASR) systems’ performance depends on the quality of the phone recognizer. That is why research teams continue developing phone recognizers, in order to enhance their performance as much as possible. Phone recognition is, in fact, a recurrent problem for the speech recognition community. Phone recognition can be found in a wide range of applications. In addition to typical LVASR systems like (Morris & Fosler-Lussier, 2008; Scanlon et al., 2007; Schwarz, 2008), it can be found in applications related to keyword detection, (Schwarz, 2008), language recognition, (Matejka, 2009; Schwarz, 2008), speaker identification, (Furui, 2005) and applications for music identification and translation, (Fujihara & Goto, 2008; Gruhne et al., 2007). The challenge of building robust acoustic models involves applying good training algorithms to a suitable set of data. The database defines the units that can be trained and

PUBLICATION RECORD

Publication year
2011
Venue
Unknown venue
Publication date
2011-06-13
Fields of study
Computer Science
Identifiers
DOI 10.5772/17600
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Acoustic Modeling Using Deep Belief Networks
2012influential reference
Discriminative training of HMMs for automatic speech recognition: A survey
2010cited by this paper
Phone recognition using Restricted Boltzmann Machines
2010cited by this paper
Updated MINDS Report on Speech Recognition and Understanding, Part 2
2009cited by this paper
Speech Recognition Using Augmented Conditional Random Fields
2009cited by this paper
Speaker-independent phoneme alignment using transition-dependent states
2009cited by this paper
An improved speech segmentation quality measure: the r-value
2009cited by this paper
Research Developments and Directions in Speech Recognition and Understanding, Part 1
2009cited by this paper
Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model, and novel feature vectors for vocal activity detection
2008cited by this paper
Conditional Random Fields for Integrating Local Discriminative Classifiers
2008cited by this paper
Soft margin estimation for automatic speech recognition
2008cited by this paper
Phoneme Recognition in Popular Music
2007cited by this paper
Using Broad Phonetic Group Experts for Improved Speech Recognition
2007influential reference
An overview on automatic speech attribute transcription (ASAT)
2007cited by this paper
Integration of Multiple Feature Sets for Reducing Ambiguity in ASR
2007cited by this paper
Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition
2007cited by this paper
Detection-based ASR in the automatic speech attribute transcription project
2007cited by this paper
Further Experiments with Detector-Based Conditional Random Fields in Phonetic Recognition
2007cited by this paper
High-Accuracy Phone Recognition By Combining High-Performance Lattice Generation and Knowledge Based Rescoring
2007influential reference
Hierarchical Structures of Neural Networks for Phoneme Recognition
2006cited by this paper
Combining phonetic attributes using conditional random fields
2006cited by this paper
A lattice search technique for a long-contextual-span hidden trajectory model of speech
2006cited by this paper
A Generative Modeling Framework for Structured Hidden Speech Dynamics
2005cited by this paper
Experiments in speech recognition using a modular MLP architecture for acoustic modelling
2003cited by this paper
Heterogeneous acoustic measurements and multiple classifiers for speech recognition
1999cited by this paper
Heterogeneous measurements and multiple classifiers for speech recognition
1998cited by this paper
Progress in dynamic programming search for LVCSR
1997cited by this paper
Transcription and Alignment of the TIMIT Database
1996cited by this paper
The HTK book
1995cited by this paper
Robustness in Automatic Speech Recognition: Fundamentals and Applications
1995cited by this paper
An application of recurrent nets to phone probability estimation
1994cited by this paper
Phonetic analyses of word and segment variation using the TIMIT corpus of American english
1994cited by this paper
Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST
1993cited by this paper
A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the WERNICKE project
1993cited by this paper
High performance speaker-independent phone recognition using CDHMM
1993cited by this paper
The general use of tying in phoneme-based HMM speech recognisers
1992cited by this paper
A recurrent error propagation network speech recognition system
1991cited by this paper
Speech database development at MIT: Timit and beyond
1990cited by this paper
Speaker-independent phone recognition using hidden Markov models
1989cited by this paper
50 Years of Progress in Speech and Speaker Recognition Research
1970cited by this paper

CITED BY

Speech-FT: Merging Pre-Trained and Fine-Tuned Speech Representation Models for Cross-Task Generalization
2025cites this paper
Listening Between the Lines: Decoding Podcast Narratives with Language Modeling
2025cites this paper
PPGs-BERT: Leveraging Phoneme Sequence and BERT for Alzheimer's Disease Detection from Spontaneous Speech
2025cites this paper
Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion
2023cites this paper
Time-varying Normalizing Flow for Generative Modeling of Dynamical Signals
2022cites this paper
Hindi Phoneme Recognition - A Review
2022cites this paper
The Post-Stroke Speech Transcription (PSST) Challenge
2022cites this paper
Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge
2022cites this paper
Phoneme Classification Using Modulating Features
2022cites this paper
Multilingual broad phoneme recognition and language-independent spoken term detection for low-resourced languages
2021cites this paper
Phonetic Segmentation using a Wavelet-based Speech Cepstral Features and Sparse Representation Classifier
2021cites this paper
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition
2020cites this paper
Pengenalan Suara Kosa Kata Terbatas Bahasa Jawa Tengah Menggunakan HTK
2020cites this paper
Pengenalan Suara Kosa Kata Terbatas Bahasa Kaledupa Wakatobi Menggunakan HTK
2020cites this paper
Two-stage spoken term detection system for under-resourced languages
2020cites this paper
Robust Classification Using Hidden Markov Models and Mixtures of Normalizing Flows
2020influential citation
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
2020cites this paper
A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond
2020cites this paper
Phoneme classification in reconstructed phase space with convolutional neural networks
2020cites this paper
Automated Chinese Language Proficiency Scoring by utilizing Siamese Convolutional Neural Network and fusion based approach
2020cites this paper
The Recognition of Persian Phonemes Using PPNet
2020cites this paper
Confusion analysis in phoneme based speech recognition in Hindi
2020influential citation
Towards Learning a Universal Non-Semantic Representation of Speech
2020cites this paper
Phase-Aware Speech Enhancement with a Recurrent Two Stage Net work
2020cites this paper
Development of Baseline System for Phonemes Recognition Task
2019cites this paper
A Hierarchical Classification Framework for Phonemes and Broad Phonetic Groups (BPGs): a Discriminative Template-Based Approach
2019cites this paper
Performance Comparison of Support Vector Machine, K-Nearest-Neighbor, Artificial Neural Networks, and Recurrent Neural networks in Gender Recognition from Voice Signals
2019cites this paper
Design and Implementation of an English Pronunciation Scoring System for Pupils Based on DNN-HMM
2019cites this paper
Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings
2019cites this paper
Exploring Phone Recognition in Pre-verbal and Dysarthric Speech
2019cites this paper
Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition
2019cites this paper
G2P and ASR techniques for low-resource phonetic transcription of Tagalog, Cebuano, and Hiligaynon
2019cites this paper
Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings
2019cites this paper
Subsegments and the emergence of segments
2019cites this paper
Rank‐weighted reconstruction feature for a robust deep neural network‐based acoustic model
2019cites this paper
Survey on Deep Neural Networks in Speech and Vision Systems
2019cites this paper
A Comparison of Hybrid and End-to-End Models for Syllable Recognition
2019cites this paper
Powering Hidden Markov Model by Neural Network based Generative Models
2019cites this paper
Design of Single Channel Speech Separation System Based on Deep Clustering Model
2019cites this paper
Theoretical learning guarantees applied to acoustic modeling
2019cites this paper
Application of Word2vec in Phoneme Recognition
2019cites this paper
Langkah Praktis Membangun Sistem Pengenalan Suara dengan HTK
2019influential citation
Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM
2018cites this paper
Convolutional Neural Networks for Phoneme Recognition
2018cites this paper
A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task
2018cites this paper
Cluster Size Management in Multi-Stage Agglomerative Hierarchical Clustering of Acoustic Speech Segments
2018cites this paper
Hindi Speech Vowel Recognition Using Hidden Markov Model
2018cites this paper
Persian Vowel recognition with MFCC and ANN on PCVC speech dataset
2018cites this paper
Persian phonemes recognition using PPNet
2018cites this paper
Pansori: ASR Corpus Generation from Open Online Video Contents
2018cites this paper
TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks
2018cites this paper
Multi-Band Processing With Gabor Filters and Time Delay Neural Nets for Noise Robust Speech Recognition
2018cites this paper
Some Issues Related to Phone Recognition and Language Identification Using Phonetic Engine
2018cites this paper
I-Vector Extraction Using Speaker Relevancy for Short Duration Speaker Recognition
2018cites this paper
Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network
2018cites this paper
A study of adaptive enhancement methods for improved distant speech recognition
2018cites this paper
Cascaded Processing of Amplitude Modulation for Natural Sound Recognition
2018cites this paper
MULTI-BAND PROCESSINGWITH GABOR FILTERS AND TIME DELAY NEURAL NETS FOR NOISE ROBUST SPEECH RECOGNITION
2018cites this paper
Noise Robust Automatic Speech Recognition Based on Spectro-Temporal Techniques
2018influential citation
A free Kazakh speech database and a speech recognition baseline
2017cites this paper
Biometric voice authentication auto-evaluation system
2017cites this paper
Algoritmos Genéticos Aplicados para Otimização dos Parâmetros de um Reconhecedor Automático de Fala
2017cites this paper
An Investigation of Crowd Speech for Room Occupancy Estimation
2017cites this paper
Sequence Modeling with CTC
2017cites this paper
Articulatory Features for Phone Recognition
2017cites this paper
Towards Quantum Language Models
2017influential citation
Vojtěch Hudeček Exploiting user ’ s feedback to improve pronunciation of TTS systems
2017cites this paper
A comparative study on the effect of different codecs on speech recognition accuracy using various acoustic modeling techniques
2017cites this paper
An improved residual LSTM architecture for acoustic modeling
2017cites this paper
Occupancy Estimation in Smart Buildings using Audio-Processing Techniques
2016cites this paper
Biologically inspired methods in speech recognition and synthesis: closing the loop
2016cites this paper
Performance Enhancement of Automatic Speech Recognition (ASR) Using Robust Wavelet-Based Feature Extraction Techniques
2016cites this paper
Comparison of Text-Independent Original Speaker Recognition from Emotionally Converted Speech
2016cites this paper
Noise Robust Keyword Spotting Using Deep Neural Networks For Embedded Platforms
2016cites this paper
Moving Toward High Precision Dynamical Modelling in Hidden Markov Models
2016cites this paper
Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories
2016cites this paper
Discriminative Capacity and Phonetic Information of Bottleneck Features in Speech
2016cites this paper
Speaker Verification Performance Evaluation Based on Open Source Speech Processing Software and TIMIT Speech Corpus
2015cites this paper
RNNDROP: A novel dropout for RNNS in ASR
2015cites this paper
Phonetic segmentation of speech using STEP and t-SNE
2015cites this paper
Improved acoustic modeling for automatic dysarthric speech recognition
2015cites this paper
Phoneme sequence recognition via DTW-based classification
2015cites this paper
Text-Independent Speaker Identification Using GMM With Universal Background Model
2015cites this paper
Hybrid Model Design for Baseline-Context-Independent-Mono-Phone Automatic Speech Recognition
2015cites this paper
American dialect identification using phonotactic and prosodic features
2015cites this paper
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
2015cites this paper
An SAD algorithm based on SGMM and phoneme combination
2015cites this paper
THCHS-30 : A Free Chinese Speech Corpus
2015cites this paper
Dual-Domain Hierarchical Classification of Phonetic Time Series
2014cites this paper
Evaluation of speech corpora for speech and speaker recognition systems
2014cites this paper
Phonology Modelling for Expressive Speech Synthesis: a Review
2014cites this paper
PHONETIC CLASSIFICATION BY ADAPTIVE NETWORK BASED FUZZY INFERENCE SYSTEM AND SUBTRACTIVE CLUSTERING
2014cites this paper
Improving deep neural networks by using sparse dropout strategy
2014cites this paper
Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition
2014cites this paper
Design and Optimization of a Speech Recognition Front-End for Distant-Talking Control of a Music Playback Device
2014cites this paper
ADAPTIVE NETWORK BASED FUZZY INFERENCE SYSTEM FOR SPEECH RECOGNITION THROUGH SUBTRACTIVE CLUSTERING
2014cites this paper
Language Identification System for the Tatar Language
2013cites this paper
Robotic Speech Recognition System
2013cites this paper
Voice Activity Detection with Focus on Low SNR and Transient Noise
2013cites this paper
Detection Under Low Signal-To-Noise Ratio and Transient Noise by
2013cites this paper