We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
J. Chorowski,Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio
Published 2014 in arXiv.org
ABSTRACT
PUBLICATION RECORD
- Publication year
2014
- Venue
arXiv.org
- Publication date
2014-12-04
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-25 of 25 references · Page 1 of 1