Learning a better representation of speech soundwaves using restricted boltzmann machines

Published 2011 in IEEE International Conference on Acoustics, Speech, and Signal Processing

ABSTRACT

State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.

PUBLICATION RECORD

Publication year
2011
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Publication date
2011-05-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/ICASSP.2011.5947700
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Practical Guide to Training Restricted Boltzmann Machines
2012cited by this paper
Information theory: A signal take on speech
2010cited by this paper
Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines
2010cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
2010cited by this paper
Deep Belief Networks for phone recognition
2009influential reference
Speech Recognition Using Augmented Conditional Random Fields
2009influential reference
CUDAMat: a CUDA-based matrix class for Python
2009cited by this paper
Unsupervised feature learning for audio classification using convolutional deep belief networks
2009cited by this paper
Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition
2007influential reference
Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition
2006influential reference
Efficient coding of natural sounds
2002cited by this paper
Training Products of Experts by Minimizing Contrastive Divergence
2002cited by this paper
Speech feature extraction using independent component analysis
2000cited by this paper
Improved phone recognition using Bayesian triphone models
1998influential reference
Heterogeneous measurements and multiple classifiers for speech recognition
1998influential reference
An information-maximization approach to blind separation and blind deconvolution
1996cited by this paper
An Information-Maximization Approach to Blind Separation and Blind Deconvolution
1995cited by this paper
An application of recurrent nets to phone probability estimation
1994cited by this paper
Information processing in dynamical systems: foundations of harmony theory
1986cited by this paper

CITED BY

Should Audio Front-Ends be Adaptive? Comparing Learnable and Adaptive Front-Ends
2025cites this paper
A Noise-Robust End-to-End Framework for Amharic Speech Recognition
2025cites this paper
Emotion Aware Speech Recognition System Using Deep Neural Networks
2025cites this paper
Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks
2024cites this paper
A hierarchical birdsong feature extraction architecture combining static and dynamic modeling
2023cites this paper
Evaluating raw waveforms with deep learning frameworks for speech emotion recognition
2023cites this paper
ЕND-TO-END SPEECH RECOGNITION SYSTEMS FOR AGGLUTINATIVE LANGUAGES
2023cites this paper
Multimodal Analysis of Acoustic and Linguistic Features in Entrepreneurial Pitches using Deep Learning
2023influential citation
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
2023cites this paper
Underwater Acoustic Target Recognition Combining Multi-scale Features and Attention Mechanism
2023cites this paper
SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks
2023cites this paper
Application of virtual human sign language translation based on speech recognition
2023cites this paper
A survey on preprocessing and classification techniques for acoustic scene
2023cites this paper
DL 101: Basic introduction to deep learning with its application in biomedical related fields
2022cites this paper
Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level
2022cites this paper
Spoken Utterance Classification Task of Arabic Numerals and Selected Isolated Words
2022cites this paper
MRI-based radiomics analysis for differentiating phyllodes tumors of the breast from fibroadenomas
2022cites this paper
Depth-Adaptive Deep Neural Network Based on Learning Layer Relevance Weights
2022cites this paper
AST: Audio Spectrogram Transformer
2021cites this paper
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
2021cites this paper
Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech
2021cites this paper
Distribution-Invariant Deep Belief Network for Intelligent Fault Diagnosis of Machines Under New Working Conditions
2021cites this paper
LEAF: A Learnable Frontend for Audio Classification
2021cites this paper
Development of Hybrid Methods for Prediction of Principal Mineral Resources
2021cites this paper
Research on LSTM+Attention Model of Infant Cry Classification
2021cites this paper
REVIEW ON APPLICATION AREAS OF DEEP LEARNING
2021cites this paper
Back to Square One: Superhuman Performance in Chutes and Ladders Through Deep Neural Networks and Tree Search
2021cites this paper
Paralinguistic Speech Processing: An Overview
2021cites this paper
Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin
2021cites this paper
A Survey of Automatic Text Summarization: Progress, Process and Challenges
2021cites this paper
Acoustic monitoring using PyzoFlex®: a novel printed sensor for smart consumer products
2021cites this paper
Sequence-to-Sequence Acoustic-to-Phonetic Conversion Using Spectrograms and Deep Learning
2021cites this paper
Beyond Lp clipping: Equalization-based Psychoacoustic Attacks against ASRs
2021cites this paper
Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
2021cites this paper
Trace Transform Feature Learning for Offline Jawi Handwritten Recognition
2021cites this paper
Speech Recognition Using Enhanced Features with Deep Belief Network for Real Time Application
2021cites this paper
Learning Deep Representation of The Emotion Speech Signal
2021cites this paper
Machine Learning for Speaker Recognition
2020cites this paper
Deep learning-based clustering approaches for bioinformatics
2020cites this paper
Finding Quantum Critical Points with Neural-Network Quantum States
2020cites this paper
Development of integral model of speech recognition system for Uzbek language
2020cites this paper
Convergence of Markovian stochastic approximation for Markov random fields with hidden variables
2020cites this paper
Machine learning in quantum computers via general Boltzmann Machines: Generative and Discriminative training through annealing
2020cites this paper
Interval Type-2 Fuzzy Restricted Boltzmann Machine
2020cites this paper
Measuring the Impact of Accurate Feature Selection on the Performance of RBM in Comparison to State of the Art Machine Learning Algorithms
2020cites this paper
Speech Quality Classifier Model based on DBN that Considers Atmospheric Phenomena
2020cites this paper
Machine learning-based unenhanced CT texture analysis for predicting BAP1 mutation status of clear cell renal cell carcinomas
2020cites this paper
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends
2020cites this paper
SPEECH EMOTION RECOGNITION SURVEY
2020cites this paper
Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation
2020cites this paper
Learning Restricted Boltzmann Machines with Sparse Latent Variables
2020cites this paper
A comprehensive survey and analysis of generative models in machine learning
2020cites this paper
A Review on Deep Learning Applications
2020cites this paper
Learning Restricted Boltzmann Machines with Few Latent Variables
2020cites this paper
A Speech Quality Classifier based on Tree-CNN Algorithm that Considers Network Degradations
2020cites this paper
Multi-task learning DNN to improve gender identification from speech leveraging age information of the speaker
2020influential citation
Deep Learning Models
2020cites this paper
Deep Learning Approaches for Speech Emotion Recognition
2020cites this paper
Intelligent Fault Diagnosis Method Based on Full 1-D Convolutional Generative Adversarial Network
2020cites this paper
Generative and discriminative training of Boltzmann machine through quantum annealing
2020cites this paper
End-to-End Speech Recognition in Agglutinative Languages
2020cites this paper
Gamma Boltzmann Machine for Simultaneously Modeling Linear- and Log-amplitude Spectra
2020cites this paper
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling
2020cites this paper
On the Robustness and Training Dynamics of Raw Waveform Models
2020cites this paper
Deep Neural Baselines for Computational Paralinguistics
2019cites this paper
Massive computational acceleration by using neural networks to emulate mechanism-based biological models
2019cites this paper
Attention-Based Dense LSTM for Speech Emotion Recognition
2019cites this paper
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech
2019cites this paper
End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition
2019cites this paper
Unsupervised Speech Representation Learning Using WaveNet Autoencoders
2019cites this paper
A Classical-Quantum Hybrid Approach for Unsupervised Probabilistic Machine Learning
2019cites this paper
Machine learning in ultrasound-guided spinal anesthesia
2019cites this paper
Deep Learning for Audio Signal Processing
2019cites this paper
Speech Recognition Using Deep Neural Networks: A Systematic Review
2019cites this paper
Deep Learning for Human Affect Recognition: Insights and New Developments
2019cites this paper
Learning representations of speech from the raw waveform. (Apprentissage de représentations de la parole à partir du signal brut)
2019cites this paper
Acoustic Model Adaptation from Raw Waveforms with Sincnet
2019cites this paper
Meta-Learning of Structured Representation by Proximal Mapping
2019cites this paper
3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms
2019cites this paper
Keyword Spotting using Time-Domain Features in a Temporal Convolutional Network
2019cites this paper
Speech Emotion Classification Using Attention-Based LSTM
2019cites this paper
Unsupervised Representation Learning for Robust Speech Recognition
2019cites this paper
End-to-End Acoustic Modeling Using Convolutional Neural Networks
2019cites this paper
Distinctive Phonetic Features Modeling and Extraction Using Deep Neural Networks
2019cites this paper
Comparison and Analysis of SampleCNN Architectures for Audio Classification
2019cites this paper
Unsupervised Raw Waveform Representation Learning for ASR
2019cites this paper
ASM1D-GAN: An Intelligent Fault Diagnosis Method Based on Assembled 1D Convolutional Neural Network and Generative Adversarial Networks
2019cites this paper
A Speech Quality Classifier based on Signal Information that Considers Wired and Wireless Degradations
2019cites this paper
Sound Sharing and Retrieval
2018cites this paper
Noise Robust Automatic Speech Recognition Based on Spectro-Temporal Techniques
2018cites this paper
Deep Learning Methods for Underwater Target Feature Extraction and Recognition
2018cites this paper
Continual deep learning via progressive learning
2018cites this paper
Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks
2018cites this paper
Multi-level region-of-interest CNNs for end to end speech recognition
2018cites this paper
An Ensemble Stacked Convolutional Neural Network Model for Environmental Event Sound Recognition
2018cites this paper
Learning filter widths of spectral decompositions with wavelets
2018cites this paper
Three-dimensional convolutional restricted Boltzmann machine for human behavior recognition from RGB-D video
2018cites this paper
Avaliação da Qualidade da Voz em Serviços de Comunicação usando Deep Learning
2018cites this paper
Voice Quality Assessment in Communication Services using Deep Learning
2018cites this paper
An evaluation of the performance of Restricted Boltzmann Machines as a model for anomaly network intrusion detection
2018cites this paper