Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition

Yongzhe Shi,Weiqiang Zhang,Meng Cai,Jia Liu

Published 2014 in EURASIP Journal on Audio, Speech, and Music Processing

ABSTRACT

Neural network language models (NNLM) have been proved to be quite powerful for sequence modeling, including feed-forward NNLM (FNNLM), recurrent NNLM (RNNLM), etc. One main issue concerned for NNLM is the heavy computational burden of the output layer, where the output needs to be probabilistically normalized and the normalizing factors require lots of computation. How to fast rescore the N-best list or lattice with NNLM attracts much attention for large-scale applications. In this paper, the statistic characteristics of normalizing factors are investigated on the N-best list. Based on the statistic observations, we propose to approximate the normalizing factors for each hypothesis as a constant proportional to the number of words in the hypothesis. Then, the unnormalized NNLM is investigated and combined with back-off N-gram for fast rescoring, which can be computed very fast without the normalization in the output layer, with the complexity reduced significantly. We apply our proposed method to a well-tuned context-dependent deep neural network hidden Markov model (CD-DNN-HMM) speech recognition system on the English-Switchboard phone-call speech-to-text task, where both FNNLM and RNNLM are trained to demonstrate our method. Experimental results show that unnormalized probability of NNLM is quite complementary to that of back-off N-gram, and combining the unnormalized NNLM and back-off N-gram can further reduce the word error rate with little computational consideration.

PUBLICATION RECORD

Publication year
2014
Venue
EURASIP Journal on Audio, Speech, and Music Processing
Publication date
2014-04-28
Fields of study
Computer Science
Identifiers
DOI 10.1186/1687-4722-2014-19
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

RNN language model with word clustering and class-based output layer
2013cited by this paper
Joint Language and Translation Modeling with Recurrent Neural Networks
2013cited by this paper
Deep maxout neural networks for speech recognition
2013cited by this paper
Structured Output Layer Neural Network Language Models for Speech Recognition
2013influential reference
Prefix tree based n-best list re-scoring for recurrent neural network language model used in speech recognition system
2013cited by this paper
A fast and simple algorithm for training neural probabilistic language models
2012cited by this paper
LSTM Neural Networks for Language Modeling
2012cited by this paper
Statistical Language Models Based on Neural Networks
2012cited by this paper
Deep Neural Network Language Models
2012cited by this paper
Recurrent Neural Network Based Language Modeling in Meeting Recognition
2011cited by this paper
Empirical Evaluation and Combination of Advanced Language Modeling Techniques
2011cited by this paper
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
2011cited by this paper
RNNLM - Recurrent Neural Network Language Modeling Toolkit
2011cited by this paper
Extensions of recurrent neural network language model
2011influential reference
Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
2010cited by this paper
Recurrent neural network based language model
2010cited by this paper
A Scalable Hierarchical Distributed Language Model
2008cited by this paper
Continuous space language models
2007cited by this paper
Hierarchical Probabilistic Neural Network Language Model
2005cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
SRILM - an extensible language modeling toolkit
2002cited by this paper
Book Reviews: WordNet: An Electronic Lexical Database
1999cited by this paper
Class-Based n-gram Models of Natural Language
1992cited by this paper
Learning representations by back-propagation errors, nature
1986cited by this paper
Learning representations by back-propagating errors
1986influential reference

CITED BY

Improvements in language and translation modeling
2016cites this paper
Improving English Conversational Telephone Speech Recognition
2016cites this paper