End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

J. Chorowski,Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio

Published 2014 in arXiv.org

ABSTRACT

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

PUBLICATION RECORD

Publication year
2014
Venue
arXiv.org
Publication date
2014-12-04
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1412.1602
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Connectionist Temporal Classiﬁcation: Labelling Unsegmented Sequences with Recurrent Neural Networks
2016influential reference
Deep Convolutional Neural Networks for Large-scale Speech Tasks
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Hybrid speech recognition with Deep Bidirectional LSTM
2013influential reference
Maxout Networks
2013influential reference
Speech recognition with deep recurrent neural networks
2013influential reference
Generating Sequences With Recurrent Neural Networks
2013cited by this paper
Pylearn2: a machine learning research library
2013influential reference
Sequence-discriminative training of deep neural networks
2013cited by this paper
Improving neural networks by preventing co-adaptation of feature detectors
2012cited by this paper
ADADELTA: An Adaptive Learning Rate Method
2012influential reference
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
2012cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Sequence Transduction with Recurrent Neural Networks
2012influential reference
The Kaldi Speech Recognition Toolkit
2011cited by this paper
Deep Belief Networks for phone recognition
2009cited by this paper
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
2009cited by this paper
Discriminative learning in sequential pattern recognition
2008cited by this paper
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
2006influential reference
Gradient-based learning applied to document recognition
1998influential reference
Bidirectional recurrent neural networks
1997cited by this paper
Neural networks for speech and sequence recognition
1996cited by this paper
Connectionist Speech Recognition: A Hybrid Approach
1993cited by this paper
Global optimization of a neural network-hidden Markov model hybrid
1991influential reference

CITED BY

Signed Relation Graph Based Dynamical Interacting System Modeling for Multi-Agent Trajectory Prediction
2026cites this paper
Whisper Has an Internal Word Aligner
2025cites this paper
Convolutional Rectangular Attention Module
2025cites this paper
Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios
2025cites this paper
Improving endpoint detection in end-to-end streaming ASR for conversational speech
2025cites this paper
A Review of Pedestrian Trajectory Prediction Methods Based on Deep Learning Technology
2025cites this paper
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
2025cites this paper
Improving Generalization of End-to-End ASR through Diversity and Independence Regularization
2025cites this paper
Chain-of-Thought Distillation for ASR Error Correction with Multimodal Large Language Models
2025cites this paper
MyanSpeech: Joint CTC-Attention and RNN Language Model for End-To-End Read Speech Recognition
2025cites this paper
Generative Annotation for ASR Named Entity Correction
2025cites this paper
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
2025cites this paper
All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR
2025cites this paper
CBATE-Net: An Accurate Battery Capacity and State-of-Health (SoH) Estimation Tool for Energy Storage Systems
2025cites this paper
Experimental Study on Time Series Analysis of Lower Limb Rehabilitation Exercise Data Driven by Novel Model Architecture and Large Models
2025cites this paper
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
2025cites this paper
Pronunciation-Lexicon Free Training for Phoneme-Based Crosslingual ASR via Joint Stochastic Approximation
2025cites this paper
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
2025cites this paper
An attention-augmented bidirectional LSTM-based encoder–decoder architecture for electrocardiogram heartbeat classification
2024cites this paper
Q-LAtte: An Efficient and Versatile LSTM Model for Quantized Attention-Based Time Series Forecasting in Building Energy Applications
2024cites this paper
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
2024cites this paper
Pedestrian trajectory prediction based on pedestrian velocity threshold constraints
2024cites this paper
Artificial Intelligence-Enabled 5G Network Performance Evaluation With Fine Granularity and High Accuracy
2024cites this paper
Learning Autoencoder Diffusion Models of Pedestrian Group Relationships for Multimodal Trajectory Prediction
2024cites this paper
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
2024cites this paper
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
2024cites this paper
Research on Person Re-Identification through Local and Global Attention Mechanisms and Combination Poolings
2024cites this paper
Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
2024cites this paper
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer Based Streaming ASR
2024cites this paper
Serialized Output Training by Learned Dominance
2024cites this paper
A Survey of Deep Learning and Foundation Models for Time Series Forecasting
2024cites this paper
Discovering Time-aware Hidden Dependencies with Personalized Graphical Structure in Electronic Health Records
2024cites this paper
Ethical framework for AI education based on large language models
2024cites this paper
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition
2024cites this paper
An unsupervised medical image registration network for intelligent medical education
2024cites this paper
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
2024cites this paper
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
2024cites this paper
Knowledge Enhanced Deep Learning: Application to Pandemic Prediction
2023cites this paper
Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts
2023cites this paper
基于多尺度建模的端到端自动语音识别方法(An End-to-End Automatic Speech Recognition Method Based on Multiscale Modeling)
2023influential citation
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extracters
2023cites this paper
Speech Recognition Method Based on Deep Learning of Artificial Intelligence: An example of BLSTM-CTC model
2023cites this paper
IA-LSTM: Interaction-Aware LSTM for Pedestrian Trajectory Prediction
2023cites this paper
Conformer Based End-to-End ASR System with a New Feature Fusion
2023cites this paper
Flexible Evidence Model to Reduce Uncertainty Mismatch Between Speech Enhancement and ASR Based on Encoder-Decoder Architecture
2023cites this paper
Hybrid Attention-Based Encoder-Decoder Model for Efficient Language Model Adaptation
2023cites this paper
Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model
2023cites this paper
An attention-based domain spatial-temporal meta-learning (ADST-ML) approach for PM2.5 concentration dynamics prediction
2023cites this paper
RAttSR: A Novel Low-Cost Reconstructed Attention-Based End-to-End Speech Recognizer
2023cites this paper
Improving End-to-End Automatic Speech Recognition with Multi-Layer-Enriched Supervised Learning
2023cites this paper
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
2023cites this paper
Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR
2023cites this paper
A Data-Light and Trajectory-Based Machine Learning Approach for the Online Prediction of Flight Time of Arrival
2023cites this paper
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
2023cites this paper
Improving Sequence-to-sequence Tibetan Speech Synthesis with Prosodic Information
2023cites this paper
Reproducibility is Nothing without Correctness: The Importance of Testing Code in NLP
2023cites this paper
SKGACN: Social Knowledge-Guided Graph Attention Convolutional Network for Human Trajectory Prediction
2023cites this paper
Multiple Surgical Instruments Tracking-By-Prediction With Graph Hierarchy
2023cites this paper
A BiGRU joint optimized attention network for recognition of drilling conditions
2023cites this paper
Improving Scheduled Sampling for Neural Transducer-Based ASR
2023cites this paper
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
2023cites this paper
Deep Learning for Operating Performance Assessment of Industrial Processes with Layer Attention-Based Stacked Performance-Relevant Denoising Auto-Encoders
2023cites this paper
Research on intelligent joint control model of outlet moisture of tobacco silk drying machine
2023cites this paper
Hybridformer: Improving Squeezeformer with Hybrid Attention and NSR Mechanism
2023cites this paper
A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System
2023cites this paper
Photoplethysmography Driven Hypertension Identification: A Pilot Study
2023cites this paper
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
2023cites this paper
LiteLSTM architecture based on weights sharing for recurrent neural networks
2023cites this paper
On the Learning Dynamics of Attention Networks
2023cites this paper
Factual Consistency Oriented Speech Recognition
2023cites this paper
Hotel Sales Forecasting with LSTM and N-BEATS
2023cites this paper
Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition
2022cites this paper
An Internal Waves Data Set From Sentinel‐1 Synthetic Aperture Radar Imagery and Preliminary Detection
2022cites this paper
Deep-Learing based Recommendation System Survey Paper
2022cites this paper
Monotonic Segmental Attention for Automatic Speech Recognition
2022cites this paper
Prediction and Detection of Sewage Treatment Process Using N-BEATS Autoencoder Network
2022cites this paper
Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses
2022cites this paper
Knowledge distillation for end-to-end speech recognition based on Conformer model
2022cites this paper
A Context-Enhanced Transformer with Abbr-Recover Policy for Chinese Abbreviation Prediction
2022cites this paper
Water Quality Prediction Based on LSTM and Attention Mechanism: A Case Study of the Burnett River, Australia
2022cites this paper
融合外部语言知识的流式越南语语音识别(Streaming Vietnamese Speech Recognition Based on Fusing External Vietnamese Language Knowledge)
2022cites this paper
Hierarchical Multi-Attention Transfer for Knowledge Distillation
2022cites this paper
Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR based on Neural Encoder-Decoder Model
2022cites this paper
Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition
2022cites this paper
HIGH DIMENSIONAL WEATHER DATA USED IN A DEEP GENERATIVE MODEL TO PREDICT TRAJECTORIES OF AIRCRAFT
2022cites this paper
Pronunciation-Aware Unique Character Encoding for RNN Transducer-Based Mandarin Speech Recognition
2022cites this paper
Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer
2022cites this paper
PoLyScriber: Integrated Training of Extractor and Lyrics Transcriber for Polyphonic Music
2022cites this paper
LTPConstraint: a transfer learning based end-to-end method for RNA secondary structure prediction
2022cites this paper
Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices
2022cites this paper
Computer-assisted Pronunciation Training - Speech synthesis is almost all you need
2022cites this paper
TagSeq: Malicious behavior discovery using dynamic analysis
2022cites this paper
Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
2022cites this paper
Auditory-Based Data Augmentation for end-to-end Automatic Speech Recognition
2022cites this paper
A High-Accuracy Two-Stage Model for Automatic Speech Recognition
2022cites this paper
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
2022cites this paper
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
2022cites this paper
Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition
2022cites this paper
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition
2022cites this paper
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
2022cites this paper