In recent years, advancements in automatic speech recognition (ASR) systems have led to their widespread use in applications such as call center bots and virtual assistants. However, these systems encounter challenges in adverse speech conditions, lack of contextual information, and recognizing rare words. In this paper, we propose a novel architecture to tackle these limitations by integrating Large Language Models (LLMs) and prompt mechanisms, aiming to enhance ASR accuracy. By using a pre-trained text encoder with a text adapter for task-specific adaptation and an efficient LLM-based re-prediction mechanism, our method has shown remarkable results in various real-world scenarios. Our proposed system achieves an average relative word error rate improvement of 27% for conventional tasks, 30% for utterance-level contextual tasks, and 33% for word-level biasing tasks compared to a base-line ASR system on multiple public datasets.
Improving Speech Recognition with Prompt-based Contextualized ASR and LLM-based Re-predictor
Nguyen Manh Tien Anh,Thach Ho Sy
Published 2024 in Interspeech
ABSTRACT
PUBLICATION RECORD
- Publication year
2024
- Venue
Interspeech
- Publication date
2024-09-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-30 of 30 references · Page 1 of 1
CITED BY
Showing 1-5 of 5 citing papers · Page 1 of 1