Finding consensus in speech recognition: word error minimization and other applications of confusion networks

Published 2000 in Computer Speech and Language

ABSTRACT

We describe a new framework for distilling information from word lattices to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding to the path with the highest posterior probability given the acoustics and a language model. However, even given optimal models, the MAP decoder does not necessarily minimize the commonly used performance metric, word error rate (WER). We describe a method for explicitly minimizing WER by extracting word hypotheses with the highest posterior probabilities from word lattices. We change the standard problem formulation by replacing global search over a large set of sentence hypotheses with local search over a small set of word candidates. In addition to improving the accuracy of the recognizer, our method produces a new representation of a set of candidate hypotheses that specifies the sequence of word-level confusions in a compact lattice format. We study the properties of confusion networks and examine their use for other tasks, such as lattice compression, word spotting, confidence annotation, and reevaluation of recognition hypotheses using higher-level knowledge sources.

PUBLICATION RECORD

Publication year
2000
Venue
Computer Speech and Language
Publication date
2000-10-01
Fields of study
Computer Science
Identifiers
DOI 10.1006/csla.2000.0152 arXiv cs/0010012
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Large vocabulary decoding and confidence estimation using word posterior probabilities
2000cited by this paper
THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM
2000cited by this paper
Minimum Bayes-risk automatic speech recognition
2000influential reference
Posterior probability decoding, confidence estimation and system combination
2000cited by this paper
Lattice Compression in the Consensual Post-Processing Framework
2000influential reference
Efficient general lattice generation and rescoring.
1999cited by this paper
A comparison of word graph and n-best list based confidence measures
1999cited by this paper
Finding consensus among words: lattice-based word error minimization
1999cited by this paper
LVCSR rescoring with modified loss functions: a decision theoretic perspective
1998cited by this paper
Neural-network based measures of confidence for word recognition
1997cited by this paper
Finite-State Transducers in Language and Speech Processing
1997cited by this paper
Explicit word error minimization in n-best list rescoring
1997cited by this paper
Improved estimation, evaluation and applications of confidence measures for speech recognition
1997cited by this paper
Estimating confidence using word lattices
1997influential reference
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)
1997influential reference
LVCSR log-likelihood ratio scoring for keyword spotting
1995cited by this paper
Efficient methods for multiple sequence alignment with guaranteed error bounds
1993cited by this paper
SWITCHBOARD: telephone speech corpus for research and development
1992cited by this paper
A Maximum Likelihood Approach to Continuous Speech Recognition
1983cited by this paper
Pattern classification and scene analysis
1974cited by this paper
Pattern classification and scene analysis
1974cited by this paper

CITED BY

Children’s Speech Recognition in Slovak
2026cites this paper
V-APA: A Voice-driven Agentic Process Automation System
2026cites this paper
CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR
2025cites this paper
Advancing Language Diversity and Inclusion: Towards a Neural Network-based Spell Checker and Correction for Wolof
2024cites this paper
Towards Automatic Evaluation of Task-Oriented Dialogue Flows
2024cites this paper
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
2024cites this paper
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
2024influential citation
Probability-Aware Word-Confusion-Network-To-Text Alignment Approach for Intent Classification
2024cites this paper
Lightweight reranking for language model generations
2023cites this paper
Self-consistency for open-ended generations
2023cites this paper
Streaming Speech-to-Confusion Network Speech Recognition
2023cites this paper
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
2023cites this paper
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
2022cites this paper
An Effective Artificial Intelligence-Enabled Error Detection and Accuracy Estimation Technique for English Speech Recognition System
2022cites this paper
Ensemble And Re-Ranking Based On Language Models To Improve ASR
2022cites this paper
Toward Zero Oracle Word Error Rate on the Switchboard Benchmark
2022cites this paper
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
2022cites this paper
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
2022cites this paper
Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition
2022cites this paper
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
2022cites this paper
Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition
2022cites this paper
Linguistically Informed Post-processing for ASR Error correction in Sanskrit
2022cites this paper
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings
2022cites this paper
SoftCTC—semi-supervised learning for text recognition using soft pseudo-labels
2022cites this paper
Improved Data Selection for Domain Adaptation in ASR
2021cites this paper
Speech recognition based on concatenated acoustic feature and lightGBM model
2021cites this paper
Word-Level Confidence Estimation for RNN Transducers
2021cites this paper
L2RS: A Learning-to-Rescore Mechanism for Hybrid Speech Recognition
2021cites this paper
Learning Word-Level Confidence for Subword End-To-End ASR
2021cites this paper
The architecture of a system for full-text search by speech data based on a global search index
2021cites this paper
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
2021cites this paper
TranSmart: A Practical Interactive Machine Translation System
2021cites this paper
On Addressing Practical Challenges for RNN-Transducer
2021cites this paper
Learning to Organize a Bag of Words into Sentences with Neural Networks: An Empirical Study
2021cites this paper
Ensemble Combination between Different Time Segmentations
2021cites this paper
Correcting Automated and Manual Speech Transcription Errors using Warped Language Models
2021cites this paper
Uncertainty-Aware Representations for Spoken Question Answering
2021cites this paper
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
2021cites this paper
Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings
2021cites this paper
Probabilistic Text Entry—Case Study 3
2021cites this paper
Text Entry in Virtual Environments using Speech and a Midair Keyboard
2021cites this paper
Neural Machine Translation Improvement by Acoustic Embedding
2020cites this paper
Using Sub-Word Units for Low-Resource Language Keyword Searching
2020cites this paper
Combination of End-to-End and Hybrid Models for Speech Recognition
2020influential citation
Language Model Data Augmentation Based on Text Domain Transfer
2020cites this paper
Identification and authentication of user voice using DNN features and i-vector
2020cites this paper
Tight Integrated End-to-End Training for Cascaded Speech Translation
2020cites this paper
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition
2020cites this paper
Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks
2020cites this paper
Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models
2020cites this paper
Modeling ASR Ambiguity for Neural Dialogue State Tracking
2020cites this paper
Innovative Pretrained-based Reranking Language Models for N-best Speech Recognition Lists
2020cites this paper
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition
2019cites this paper
On combining features for single-channel robust speech recognition in reverberant environments
2019cites this paper
Multi-Graph Decoding for Code-Switching ASR
2019cites this paper
Automated Testing of Basic Recognition Capability for Speech Recognition Systems
2019cites this paper
Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
2019cites this paper
Low Resource Keyword Search With Synthesized Crosslingual Exemplars
2019cites this paper
Recurrent out-of-vocabulary word detection based on distribution of features
2019cites this paper
Lithuanian Igbo Vietnamese Cantonese MFCCs Bottleneck Language Independent Shared Layers Cantonese PrefinalVietnamese PrefinalJavanese PrefinalTok Pisin Prefinal Cantonese SoftmaxVietnamese SoftmaxJavanese SoftmaxTok
2019cites this paper
Better Document-Level Machine Translation with Bayes’ Rule
2019cites this paper
Ensemble generation and compression for speech recognition
2019influential citation
Incorporating label dependency for ASR error detection via RNN
2019cites this paper
Neural Machine Translation with Acoustic Embedding
2019cites this paper
Optimisation methods for training deep neural networks in speech recognition
2019cites this paper
ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis
2019cites this paper
Réseaux de neurones profonds appliqués à la compréhension de la parole. (Deep learning applied to spoken langage understanding)
2019cites this paper
An Empirical Evaluation of DTW Subsampling Methods for Keyword Search
2019cites this paper
Improving ASR Confidence Scores for Alexa Using Acoustic and Hypothesis Embeddings
2019cites this paper
Information Extraction in Handwritten Marriage Licenses Books
2019cites this paper
Applications Of Large Vocabulary Continuous Speech Recognition To Fatigue Detection
2019influential citation
Acoustic Model Bootstrapping Using Semi-Supervised Learning
2019cites this paper
Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders
2019cites this paper
Large Margin Training for Attention Based End-to-End Speech Recognition
2019cites this paper
L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition
2019cites this paper
System-independent ASR error detection and classification using Recurrent Neural Network
2019cites this paper
Graph Based Translation Memory for Neural Machine Translation
2019cites this paper
Context-aware speech synthesis: A human-inspired model for monitoring and adapting synthetic speech
2019cites this paper
Towards Better Understanding of Spontaneous Conversations: Overcoming Automatic Speech Recognition Errors With Intent Recognition
2019influential citation
A Neural Network Based Ranking Framework to Improve ASR with NLU Related Knowledge Deployed
2019cites this paper
Speech-recognition cloud harvesting for improving the navigation of cyber-physical wheelchairs for disabled persons
2019cites this paper
Direct Neuron-Wise Fusion of Cognate Neural Networks
2019cites this paper
CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES
2019cites this paper
Forward-Backward Attention Decoder
2018cites this paper
Rescoring N-Best Speech Recognition List Based on One-on-One Hypothesis Comparison Using Encoder-Classifier Model
2018cites this paper
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
2018cites this paper
Application of Progressive Neural Networks for Multi-Stream Wfst Combination in One-Pass Decoding
2018cites this paper
A Hand-Held Multimedia Translation and Interpretation System with Application to Diet Management
2018cites this paper
Device-directed Utterance Detection
2018cites this paper
Lecture 12
2018cites this paper
Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation
2018cites this paper
Speaker-Adapted Confidence Measures for ASR Using Deep Bidirectional Recurrent Neural Networks
2018cites this paper
Structured deep neural networks for speech recognition
2018cites this paper
Practical Application of Domain Dependent Confidence Measurement for Spoken Language Understanding Systems
2018cites this paper
A Decade of Discriminative Language Modeling 13 2 Features In DLMs
2018cites this paper
Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions
2018cites this paper
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
2018cites this paper
Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS
2018influential citation
Features, Representations, and Matching Techniques for Audio Search
2018cites this paper
Reinforcement learning and reward estimation for dialogue policy optimisation
2018cites this paper