Towards Better Decoding and Language Model Integration in Sequence to Sequence Models

Published 2016 in Interspeech

ABSTRACT

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion. In this contribution, we analyse an attention-based seq2seq speech recognition system that directly transcribes recordings into characters. We observe two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used. We propose practical solutions to both problems achieving competitive speaker independent word error rates on the Wall Street Journal dataset: without separate language models we reach 10.6% WER, while together with a trigram language model, we reach 6.7% WER.

PUBLICATION RECORD

Publication year
2016
Venue
Interspeech
Publication date
2016-12-08
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.21437/Interspeech.2017-343 arXiv 1612.02695
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Top Downloads in IEEE Xplore [Reader's Choice]
2017cited by this paper
Regularizing Neural Networks by Penalizing Confident Output Distributions
2017cited by this paper
Lip Reading Sentences in the Wild
2016cited by this paper
Modeling Coverage for Neural Machine Translation
2016cited by this paper
Very deep convolutional networks for end-to-end speech recognition
2016cited by this paper
Learning online alignments with continuous rewards policy gradient
2016cited by this paper
Sequence-to-Sequence Learning as Beam-Search Optimization
2016cited by this paper
Globally Normalized Transition-Based Neural Networks
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
2016cited by this paper
Asynchronous Methods for Deep Reinforcement Learning
2016cited by this paper
SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks
2016cited by this paper
Latent Sequence Decompositions
2016cited by this paper
DisturbLabel: Regularizing CNN on the Loss Layer
2016cited by this paper
A Coverage Embedding Model for Neural Machine Translation
2016cited by this paper
Attention-Based Models for Speech Recognition
2015cited by this paper
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
End-to-end attention-based large vocabulary speech recognition
2015influential reference
Rethinking the Inception Architecture for Computer Vision
2015influential reference
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding
2015cited by this paper
On Using Monolingual Corpora in Neural Machine Translation
2015cited by this paper
Deep Learning
2015cited by this paper
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
2015cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Towards End-To-End Speech Recognition with Recurrent Neural Networks
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Deep Speech: Scaling up end-to-end speech recognition
2014influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
The Kaldi Speech Recognition Toolkit
2011cited by this paper
Discriminative learning in sequential pattern recognition
2008cited by this paper
OpenFst: A General and Efficient Weighted Finite-State Transducer Library
2007cited by this paper
Function Optimization using Connectionist Reinforcement Learning Algorithms
1991cited by this paper

CITED BY

IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition
2026cites this paper
Chinese speech recognition based on improved end-to-end transformer learning model
2026cites this paper
Speech Recognition Using Deep Learning Techniques: A Comparative Study
2025cites this paper
Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
2025cites this paper
SQUIREDL: Sparse Sequence-to-Sequence Uncertainty Estimation in Evidential Deep Learning
2025cites this paper
Dynamic Search for Inference-Time Alignment in Diffusion Models
2025cites this paper
Whisper Has an Internal Word Aligner
2025cites this paper
Improving Contextual ASR via Multi-grained Fusion with Large Language Models
2025cites this paper
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
2025cites this paper
Graph Neural Network-Based Attribute Auxiliary Structured Grouping for Person Re-Identification
2025cites this paper
Non-Intrusive Automatic Speech Recognition Refinement: A Survey
2025cites this paper
Regularization of ML models for Earth systems by using longer model timesteps
2025cites this paper
LS2: Boosting Hidden Separation for Backdoor Defense With Learning Speed-Driven Label Smoothing
2025cites this paper
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
2025cites this paper
MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems
2025cites this paper
Label Smoothing is a Pragmatic Information Bottleneck
2025cites this paper
Mixture of LoRA Experts With Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
2025cites this paper
Improving End-to-End Speech Recognition Through Conditional Cross-Modal Knowledge Distillation with Language Model
2024cites this paper
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
2024cites this paper
Meshed Context-Aware Beam Search for Image Captioning
2024cites this paper
An efficient text augmentation approach for contextualized Mandarin speech recognition
2024cites this paper
Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
2024cites this paper
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding
2024cites this paper
Label Augmentation as Inter-class Data Augmentation for Conditional Image Synthesis with Imbalanced Data
2024cites this paper
Language Model Personalization for Speech Recognition: A Clustered Federated Learning Approach With Adaptive Weight Average
2024cites this paper
Keep Decoding Parallel With Effective Knowledge Distillation From Language Models To End-To-End Speech Recognisers
2024cites this paper
Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale
2024cites this paper
Massive End-to-end Speech Recognition Models with Time Reduction
2024cites this paper
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
2024cites this paper
Label Smoothing Improves Machine Unlearning
2024cites this paper
Automatic Authorship Analysis in Human-AI Collaborative Writing
2024cites this paper
Bilingual Road Text Recognition Based on a Hybrid Model of CTC and Attention
2024cites this paper
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
2024cites this paper
Large Language Models for Dysfluency Detection in Stuttered Speech
2024cites this paper
End-to-End Speech Recognition with Pre-trained Masked Language Model
2024cites this paper
Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models
2024influential citation
Coarse-to-Fine Nutrition Prediction
2024cites this paper
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
2024cites this paper
The Complexity of Sequential Prediction in Dynamical Systems
2024cites this paper
Self-Relaxed Joint Training: Sample Selection for Severity Estimation with Ordinal Noisy Labels
2024cites this paper
ReFixMatch-LS: reusing pseudo-labels for semi-supervised skin lesion classification
2023cites this paper
Transformer-Based Lip-Reading with Regularized Dropout and Relaxed Attention
2023cites this paper
Confidence-aware calibration and scoring functions for curriculum learning
2023cites this paper
Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text
2023cites this paper
Adaptive Spatiotemporal InceptionNet for Traffic Flow Forecasting
2023cites this paper
Contextual Spelling Correction with Large Language Models
2023cites this paper
Combining multiple end-to-end speech recognition models based on density ratio approach
2023cites this paper
The NTNU Super Monster Team (SPMT) system for the Formosa Speech Recognition Challenge 2023 - Hakka ASR
2023cites this paper
What Kind of Multi- or Cross-lingual Pre-training is the most Effective for a Spontaneous, Less-resourced ASR Task?
2023influential citation
Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text
2023cites this paper
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
2023cites this paper
A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores
2023cites this paper
Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks
2023cites this paper
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
2023cites this paper
A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition
2023cites this paper
Chatbot Development Through the Ages : A Survey
2023cites this paper
Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
2023cites this paper
Stay on topic with Classifier-Free Guidance
2023cites this paper
Blank-regularized CTC for Frame Skipping in Neural Transducer
2023cites this paper
External Language Model Integration for Factorized Neural Transducers
2023cites this paper
Predicting Customer Satisfaction with Soft Labels for Ordinal Classification
2023cites this paper
Large-Scale Language Model Rescoring on Long-Form Data
2023cites this paper
Measurement and Real-Time Recognition of Driver Trust in Conditionally Automated Vehicles: Using Multimodal Feature Fusions Network
2023cites this paper
A Comparison Study on AI Language Detector
2023cites this paper
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
2023cites this paper
FixMatch-LS: Semi-supervised skin lesion classification with label smoothing
2023cites this paper
Learning Category Distribution for Text Classification
2023cites this paper
On-the-Fly Text Retrieval for end-to-end ASR Adaptation
2023cites this paper
Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition
2023cites this paper
Cumulative Attention Based Streaming Transformer ASR with Internal Language Model Joint Training and Rescoring
2023influential citation
End-to-End Speech Recognition: A Survey
2023influential citation
CopyNE: Better Contextual ASR by Copying Named Entities
2023cites this paper
Perception and Semantic Aware Regularization for Sequential Confidence Calibration
2023cites this paper
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
2023cites this paper
Aerial image recognition in discriminative bi-transformer
2023cites this paper
Rethinking Label Refurbishment: Model Robustness under Label Noise
2023cites this paper
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
2023cites this paper
Massively Multilingual Shallow Fusion with Large Language Models
2023cites this paper
An Overview on Language Models: Recent Developments and Outlook
2023cites this paper
Label Smoothing is Robustification against Model Misspecification
2023cites this paper
Massive End-to-end Models for Short Search Queries
2023cites this paper
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing
2022cites this paper
Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
2022cites this paper
Rethinking Label Smoothing on Multi-hop Question Answering
2022cites this paper
Robust & Compact End-to-End Hindi Language ASR System
2022cites this paper
A Soft Label Deep Learning to Assist Breast Cancer Target Therapy and Thyroid Cancer Diagnosis
2022cites this paper
A Gift from Label Smoothing: Robust Training with Adaptive Label Smoothing via Auxiliary Classifier under Label Noise
2022cites this paper
DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
2022cites this paper
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss
2022cites this paper
A comparative study on neural networks for paroxysmal atrial fibrillation events detection from electrocardiography.
2022cites this paper
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
2022cites this paper
Concept-Based Label Distribution Learning for Text Classification
2022cites this paper
JOIST: A Joint Speech and Text Streaming Model for ASR
2022cites this paper
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
2022cites this paper
Improving Hybrid CTC/Attention Architecture for Agglutinative Language Speech Recognition
2022cites this paper
融合外部语言知识的流式越南语语音识别(Streaming Vietnamese Speech Recognition Based on Fusing External Vietnamese Language Knowledge)
2022cites this paper
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition
2022cites this paper
The Health ChatBots in Telemedicine: Intelligent Dialog System for Remote Support
2022cites this paper
Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels
2022cites this paper
Autoregressive Predictive Coding: A Comprehensive Study
2022cites this paper