Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures

Published 2003 in North American Chapter of the Association for Computational Linguistics

ABSTRACT

Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.

PUBLICATION RECORD

Publication year
2003
Venue
North American Chapter of the Association for Computational Linguistics
Publication date
2003-05-27
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1073483.1073486
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Text normalization with varied data sources for conversational speech language modeling
2002cited by this paper
Improving trigram language modeling with the World Wide Web
2001cited by this paper
The Meeting Project at ICSI
2001cited by this paper
Normalization of non-standard words
2001cited by this paper
THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM
2000cited by this paper
Entropy-based Pruning of Backoff Language Models
2000cited by this paper
Selecting articles from the language model training corpus
2000cited by this paper
Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger
2000cited by this paper
Improved topic-dependent language modeling using information retrieval techniques
1999cited by this paper
Relevance weighting for combining multi-domain data for n-gram language modeling
1999cited by this paper
Exploiting both local and global constraints for multi-span statistical language modeling
1998cited by this paper
Just-in-time language modelling
1998cited by this paper
Language model adaptation using mixtures and an exponentially decaying cache
1997cited by this paper
Adaptive topic - dependent language modelling using word - based varigrams
1997cited by this paper
Transforming out-of-domain estimates to improve in-domain language models
1997cited by this paper
A class based approach to domain adaptation and constraint integration for empirical m-gram models
1997cited by this paper
A Maximum Entropy Model for Part-Of-Speech Tagging
1996cited by this paper
Language Modeling with Limited Domain Data
1995cited by this paper
SWITCHBOARD: telephone speech corpus for research and development
1992influential reference

CITED BY

Uneven success: automatic speech recognition and ethnicity-related dialects
2022cites this paper
Wisdom of Crowds を用いた音声言語理解の精度向上
2020influential citation
Improving Spoken Language Understanding by Wisdom of Crowds
2020cites this paper
CHAPTER 3 N-gram Language Models
2020cites this paper
State-of-the-Art Overview
2019cites this paper
Natural Language Processing Approaches in Bioinformatics
2019cites this paper
Session-level Language Modeling for Conversational Speech
2018influential citation
Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition
2018cites this paper
The Microsoft 2017 Conversational Speech Recognition System
2017cites this paper
Low-Rank RNN Adaptation for Context-Aware Language Modeling
2017cites this paper
Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models
2016cites this paper
Multi-Language Neural Network Language Models
2016cites this paper
BASED DATA AUGMENTATION FOR CANTONESE KEYWORD SPOTTING
2016cites this paper
Exploiting noisy web data by OOV ranking for low-resource keyword search
2016cites this paper
Getting more from automatic transcripts for semi-supervised language modeling
2016cites this paper
Generalizing and Hybridizing Count-based and Neural Language Models
2016cites this paper
Machine translation based data augmentation for Cantonese keyword spotting
2016cites this paper
Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search
2016cites this paper
Metadiscourse tagging in academic lectures
2016influential citation
Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition
2016cites this paper
Token-level interpolation for class-based language models
2015cites this paper
From Language to the Real World: Entity-Driven Text Analytics
2015cites this paper
Paraphrastic recurrent neural network language models
2015cites this paper
Improving speech recognition and keyword search for low resource languages using web data
2015cites this paper
Data Selection With Fewer Words
2015cites this paper
Leveraging Twitter for Low-Resource Conversational Speech Language Modeling
2015cites this paper
Recurrent neural network language model training with noise contrastive estimation for speech recognition
2015cites this paper
Scalable Recurrent Neural Network Language Models for Speech Recognition
2015cites this paper
Discriminative training of context-dependent language model scaling factors and interpolation weights
2015cites this paper
Enhancing low resource keyword spotting with automatically retrieved web documents
2015cites this paper
Improving the training and evaluation efficiency of recurrent neural network language models
2015cites this paper
Data Selection for Statistical Machine Translation
2014cites this paper
Company Mention Detection for Large Scale Text Mining
2014cites this paper
Open-domain Language Model Construction for Speech Driven Question Answering Employing Expansion with Similar Nouns
2014cites this paper
Spoken Dialogue System for Information Navigation based on Statistical Learning of Semantic and Dialogue Structure
2014influential citation
Incorporating Weak Statistics for Low-Resource Language Modeling
2014cites this paper
Efficient lattice rescoring using recurrent neural network language models
2014cites this paper
Rapid Generation of Pronunciation Dictionaries for new Domains and Languages
2014cites this paper
Biber Redux: Reconsidering Dimensions of Variation in American English
2014cites this paper
Named Entity Recognition from Financial Press Releases
2014cites this paper
Abin-based ontological framework for low-resourcen-gram smoothing in language modelling
2014cites this paper
Web-based possibilistic language models for automatic speech recognition
2014cites this paper
Paraphrastic neural network language models
2014cites this paper
Paraphrastic language models
2014cites this paper
Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch
2014cites this paper
Incorporating semantic information to selection of web texts for language model of spoken dialogue system
2013cites this paper
Syllable language models for Mandarin speech recognition: exploiting character language models.
2013cites this paper
Language model cross adaptation for LVCSR system combination
2013cites this paper
Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0
2013cites this paper
An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News
2013cites this paper
Using web text to improve keyword spotting in speech
2013cites this paper
Cross-domain paraphrasing for improving language modelling using out-of-domain data
2013cites this paper
Use of contexts in language model interpolation and adaptation
2013cites this paper
Transcription of Russian conversational speech
2012cites this paper
Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages
2012cites this paper
Transcribing Meetings With the AMIDA Systems
2012cites this paper
Automatic transcription of academic lectures from diverse disciplines
2012cites this paper
Revisiting the Predictability of Language: Response Completion in Social Media
2012cites this paper
Training a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition
2011cites this paper
Similarity Based Language Model Construction for Voice Activated Open-Domain Question Answering
2011cites this paper
Integrated Online Speaker Clustering and Adaptation
2011cites this paper
présentée à l'Université d'Avignon et des Pays de Vaucluse pour obtenir le diplôme de DOCTORAT
2011cites this paper
Modèles de langage ad hoc pour la reconnaissance automatique de la parole. (Ad-hoc language models for automatic speech recognition)
2011cites this paper
Vocabulary and Language Model Adaptation Using just One File
2010cites this paper
Language model adaptation using WWW documents obtained by utterance-based queries
2010cites this paper
Adaptation thématique non supervisée d'un système de reconnaissance automatique de la parole. (Unsupervised topic-based adaptation of an automatic speech recognition system)
2010cites this paper
Vocabulary and language model adaptation using just one speech file
2010cites this paper
Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates
2010cites this paper
Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription
2010cites this paper
Web Augmentation of Language Models for Continuous Speech Recognition of SMS Text Messages
2009cites this paper
Unsupervised acoustic and language model training with small amounts of labelled data
2009cites this paper
Speaker normalisation for large vocabulary multiparty conversational speech recognition
2009cites this paper
Filtering web text to match target genres
2009cites this paper
Constraint Dependency Grammars : SuperARVs , Language Modeling , and Parsing
2009influential citation
A System for Simultaneous Translation of Lectures and Speeches
2009cites this paper
Language Modeling for limited-data domains
2009cites this paper
An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation
2009cites this paper
Analysis of low-resource acoustic model self-training
2009cites this paper
Part-of-speech histograms for genre classification of text
2009cites this paper
Useful Transcriptions of Webcast Lectures
2009cites this paper
Probabilistic and possibilistic language models based on the world wide web
2009cites this paper
Speech Recognition in Mobile Phones
2008cites this paper
Rapid development of an English/Farsi speech-to-speech translation system.
2008cites this paper
Unsupervised versus supervised training of acoustic models
2008influential citation
Vocabulary Adaptation Using Contextual Information and Information Retrieval
2008cites this paper
PROGRESS IN MEETING RECOGNITION: THE ICSI-SRI-UW SPRING 2004 EVALUATION SYSTEM
2008cites this paper
Utilization of Huge Written Text Corpora for Conversational Speech Recognition
2008cites this paper
Speech Processing for Audio Indexing
2008cites this paper
Natural Language Processing and the Web
2008cites this paper
Online vocabulary adaptation using contextual information and information retrieval
2008cites this paper
Speechlinks: Robust Cross-Lingual Tactical Communication Aids
2008cites this paper
Topic-Specific Language Model Based on Graph Spectral Approach for Speech Recognition
2008cites this paper
Rapid bootstrapping of statistical spoken dialogue systems
2008cites this paper
The AMI System for the Transcription of Speech in Meetings
2007cites this paper
Web resources for language modeling in conversational speech recognition
2007cites this paper
Random forests and the data sparseness problem in language modeling
2007cites this paper
Continuous space language models
2007cites this paper
Informations morpho-syntaxiques et adaptation thématique pour améliorer la reconnaissance de la parole
2007cites this paper
Language Model Adaptation in Machine Translation from Speech
2007cites this paper
Transformation sharing strategies for mllr speaker adaptation
2007cites this paper