Explicit word error minimization in n-best list rescoring

Published 1997 in EUROSPEECH

ABSTRACT

We show that the standard hypothesis scoring paradigm used in maximum-likelihood-based speech recognition systems is not optimal with regard to minimizing the word error rate, the commonly used performance metric in speech recognition. This can lead to sub-optimal performance, especially in high-error-rate environments where word error and sentence error are not necessarily monotonically related. To address this discrepancy, we developed a new algorithm that explicitly minimizes expected word error for recognition hypotheses. First, we approximate the posterior hypothesis probabilities using N-best lists. We then compute the expected word error for each hypothesis with respect to the posterior distribution, and choose the hypothesis with the lowest error. Experiments show improved recognition rates on two spontaneous speech corpora.

PUBLICATION RECORD

Publication year
1997
Venue
EUROSPEECH
Publication date
1997-09-22
Fields of study
Computer Science
Identifiers
DOI 10.21437/Eurospeech.1997-68
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Neural-network based measures of confidence for word recognition
1997cited by this paper
LVCSR log-likelihood ratio scoring for keyword spotting
1995cited by this paper
Combining Linguistic and Statistical Knowledge Sources in Natural-Language Processing for ATIS
1995cited by this paper
SWITCHBOARD: telephone speech corpus for research and development
1992cited by this paper
Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses
1991cited by this paper
A Maximum Likelihood Approach to Continuous Speech Recognition
1983cited by this paper
Pattern classification and scene analysis
1974cited by this paper
Pattern classification and scene analysis
1974cited by this paper

CITED BY

Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation
2025cites this paper
Large Language Models for Dysfluency Detection in Stuttered Speech
2024cites this paper
Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation
2024cites this paper
Arabic Speech Recognition: Advancement and Challenges
2024cites this paper
Dialect Adaptation and Data Augmentation for Low-Resource ASR: Taltech Systems for the Madasr 2023 Challenge
2023cites this paper
Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
2023cites this paper
Modelling Inter-Rater Uncertainty in Spoken Language Assessment
2023cites this paper
It’s MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk
2023cites this paper
On The Diversity of ASR Hypotheses In Spoken Language Understanding
2022cites this paper
LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation
2021cites this paper
High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
2021cites this paper
Minimum Bayes Risk Decoding with Neural Metrics of Translation Quality
2021cites this paper
Combining Hybrid and End-to-End Approaches for the OpenASR20 Challenge
2021cites this paper
Sampling-Based Minimum Bayes Risk Decoding for Neural Machine Translation
2021cites this paper
Class-Selective Mini-Batching and Multitask Learning for Visual Relationship Recognition
2021cites this paper
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition
2021cites this paper
Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
2021cites this paper
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
2020cites this paper
Large Margin Training for Attention Based End-to-End Speech Recognition
2019cites this paper
Word Importance Modeling to Enhance Captions Generated by Automatic Speech Recognition for Deaf and Hard of Hearing Users
2019influential citation
Contextual language understanding Thoughts on Machine Learning in Natural Language Processing
2019cites this paper
Image-speech combination for interactive computer assisted transcription of handwritten documents
2019influential citation
Adversarial training and decoding strategies for end-to-end neural conversation models
2019cites this paper
Ensemble generation and compression for speech recognition
2019cites this paper
Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning
2019cites this paper
Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic Speech Recognition
2018cites this paper
Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning
2018cites this paper
Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing
2018cites this paper
Domain adaptation for statistical machine translation and neural machine translation
2017cites this paper
Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition
2017cites this paper
Early and late integration of audio features for automatic video description
2017cites this paper
Étude sur les représentations continues de mots appliquées à la détection automatique des erreurs de reconnaissance de la parole. (A study of continuous word representations applied to the automatic detection of speech recognition errors)
2017cites this paper
Sequence Adversarial Training and Minimum Bayes Risk Decoding for End-to-end Neural Conversation Models
2017cites this paper
Étude sur les représentations continues de mots appliquées à la détection automatique des erreurs de reconnaissance de la parole
2017cites this paper
Evaluation of Language Models over Croatian Newspaper Texts
2017cites this paper
Effect of Speech Recognition Errors on Text Understandability for People who are Deaf or Hard of Hearing
2016influential citation
Computing the Expected Edit Distance from a String to a PFA
2016cites this paper
Adaptive Boosting for Automatic Speech Recognition
2016cites this paper
Improvements in language and translation modeling
2016cites this paper
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription
2016cites this paper
Information Fusion Approaches for Distant Speech Recognition in a Multi-microphone Setting
2016cites this paper
Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment
2016cites this paper
Graphical Models for Peptide Identification of Tandem Mass Spectra
2016cites this paper
A survey on the application of recurrent neural networks to statistical language modeling
2015cites this paper
Hypotheses ranking and state tracking for a multi-domain dialog system using multiple ASR alternates
2015cites this paper
Modeling of Slovak Language for Broadcast News Transcription
2015cites this paper
Improving Speech Recognizer Performance in a Dialog System Using N-best Hypotheses Reranking
2015cites this paper
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
2014cites this paper
Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment
2014cites this paper
Ensemble Methods for Historical Machine-Printed Document Recognition
2014cites this paper
Language Models With Meta-information
2014cites this paper
Robust ASR in Reverberant Environments using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction
2014cites this paper
Recent advances in the statistical modeling of the Slovak language
2014cites this paper
Channel selection and reverberation-robust automatic speech recognition
2013cites this paper
Pronunciation Extraction from Phoneme Sequences through Cross-Lingual Word-to-Phoneme Alignment
2013cites this paper
Channel selection using n-best hypothesis for multi-microphone ASR
2013cites this paper
Revisiting hybrid and GMM-HMM system combination techniques
2013cites this paper
Exploiting the succeeding words in recurrent neural network language models
2013cites this paper
Semantic parsing using word confusion networks with conditional random fields
2013cites this paper
WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding
2012cites this paper
Integration of multiple acoustic and language models for improved Hindi speech recognition system
2012cites this paper
Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge
2012cites this paper
音声ドキュメントの音声認識,整形,要約に関する研究
2012cites this paper
Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática
2012cites this paper
Bidirectional Language Model for Handwriting Recognition
2012cites this paper
Topic Mining based on Word Posterior Probability in Spoken Document
2011cites this paper
Aniterative approach to Bayes risk decoding and system combination
2011cites this paper
Optimal Search for Minimum Error Rate Training
2011cites this paper
Bayes risk decoding and its application to system combination
2011cites this paper
Investigation of improved approaches to bayes risk decoding
2011cites this paper
Minimum Bayes risk discriminative language models for Arabic speech recognition
2011cites this paper
Minimum Bayes Risk decoding and system combination based on a recursion for edit distance
2011cites this paper
Using N-Best Lists and Confusion Networks for Meeting Summarization
2011cites this paper
Search and decoding strategies for complex lexical modeling in lvcsr
2011cites this paper
Fusion de données audio-visuelles pour l'interaction Homme-Robot
2010cites this paper
Different Evaluation Approaches of Confusion Network in Chinese Spoken Classification
2010cites this paper
An improved minimum word error approachto lattice rescoring and system combination
2010cites this paper
An improved consensus-like method for Minimum Bayes Risk decoding and lattice combination
2010cites this paper
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
2010cites this paper
Improving the Readability of Class Lecture Automatic Speech Recognition Results using Multiple Hypotheses
2010cites this paper
Using Confusion Networks for Speech Summarization
2010cites this paper
Robust machine translation for multi-domain tasks
2010cites this paper
Multimodal Processing and Interaction, Audio, Video, Text
2010cites this paper
Automatic extractive summarization on meeting corpus
2010cites this paper
Pseudo Conditional Random Fields: Joint Training Approach to Segmenting and Labeling Sequence Data
2010cites this paper
Minimum hypothesis phone error as a decoding method for speech recognition
2009influential citation
A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition
2009cites this paper
A Feature Selection Approach for Automatic Music Genre Classification
2009cites this paper
Iterative decoding: A novel re-scoring framework for confusion networks
2009cites this paper
Morphosyntactic Resources for Automatic Speech Recognition
2008cites this paper
Minimum Bayes-Risk decoding with presumedword significance for speech based information retrieval
2008cites this paper
Development of SRI’s 1997 Broadcast News Transcription System
2008cites this paper
Word/sub-word lattices decomposition and combination for speech recognition
2008cites this paper
Efficient Error Correction for Speech Recognition Systems using Constrained Re-recognition
2008cites this paper
Construction et stratégie d'exploitation des réseaux de confusion en lien avec le contexte applicatif de la compréhension de la parole. (Confusion networks : construction algorithms and Spoken Language Understanding decision strategies in real applications)
2008cites this paper
Efficient error correction for speech systems using constrained re-recognition
2008cites this paper
Exploiting non-linear probabilistic models in natural language parsing and reranking
2008cites this paper
Automatic sentence structure annotation for spoken language processing
2008cites this paper
Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-Syntax and Pragmatics for Transcription
2008cites this paper
Confidence based multiple classifier fusion in speaker verification
2008cites this paper