Generative Annotation for ASR Named Entity Correction

Yuanchang Luo,Daimeng Wei,Shaojun Li,Hengchao Shang,Jiaxin Guo,Zongyao Li,Zhanglin Wu,Xiaoyu Chen,Zhiqiang Rao,Jinlong Yang,Hao Yang

Published 2025 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

End-to-end automatic speech recognition systems often fail to transcribe domain-specific named entities, causing catastrophic failures in downstream tasks. Numerous fast and lightweight named entity correction (NEC) models have been proposed in recent years. These models, mainly leveraging phonetic-level edit distance algorithms, have shown impressive performances. However, when the forms of the wrongly-transcribed words(s) and the ground-truth entity are significantly different, these methods often fail to locate the wrongly transcribed words in hypothesis, thus limiting their usage. We propose a novel NEC method that utilizes speech sound features to retrieve candidate entities. With speech sound features and candidate entities, we inovatively design a generative method to annotate entity errors in ASR transcripts and replace the text with correct entities. This method is effective in scenarios of word form difference. We test our method using open-source and self-constructed test sets. The results demonstrate that our NEC method can bring significant improvement to entity accuracy. The self-constructed training data and test set is publicly available at github.com/L6-NLP/Generative-Annotation-NEC.

PUBLICATION RECORD

Publication year
2025
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2025-08-28
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2508.20700 arXiv 2508.20700
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation
2023cited by this paper
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
2023cited by this paper
Robust Speech Recognition via Large-Scale Weak Supervision
2022cited by this paper
AISHELL-NER: Named Entity Recognition from Chinese Speech
2022cited by this paper
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio
2021cited by this paper
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition
2021cited by this paper
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
2021cited by this paper
Spelling Error Correction with Soft-Masked BERT
2020cited by this paper
Multilingual Denoising Pre-training for Neural Machine Translation
2020cited by this paper
End-to-end Named Entity Recognition from English Speech
2020cited by this paper
Contextual RNN-T For Open Domain ASR
2020influential reference
ASR Error Correction with Augmented Transformer for Entity Retrieval
2020cited by this paper
Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition
2020cited by this paper
Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition
2019cited by this paper
Entity resolution for noisy ASR transcripts
2019cited by this paper
A Spelling Correction Model for End-to-end Speech Recognition
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Deep Context: End-to-end Contextual Speech Recognition
2018cited by this paper
AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline
2017influential reference
Librispeech: An ASR corpus based on public domain audio books
2015cited by this paper
Towards End-To-End Speech Recognition with Recurrent Neural Networks
2014cited by this paper
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
2014cited by this paper
Sequence Transduction with Recurrent Neural Networks
2012cited by this paper

CITED BY

Non-Intrusive Automatic Speech Recognition Refinement: A Survey
2025cites this paper