Latent Sequence Decompositions

William Chan,Yu Zhang,Quoc V. Le,N. Jaitly

Published 2016 in International Conference on Learning Representations

ABSTRACT

We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence. We present a training algorithm which samples valid extensions and an approximate decoding algorithm. We experiment with the Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER compared to a character baseline of 14.8% WER. When combined with a convolutional network on the encoder, we achieve 9.6% WER.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Learning Representations
Publication date
2016-10-10
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1610.03035
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Under review as a conference paper at ICLR 2020 many domain adaptation methods
2019cited by this paper
Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS
2018cited by this paper
Advances in all-neural speech recognition
2016cited by this paper
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
2016cited by this paper
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
2016influential reference
Very deep convolutional networks for end-to-end speech recognition
2016cited by this paper
Latent Predictor Networks for Code Generation
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
Order Matters: Sequence to sequence for sets
2015cited by this paper
Task Loss Estimation for Sequence Prediction
2015influential reference
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Attention-Based Models for Speech Recognition
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
2015influential reference
End-to-end attention-based large vocabulary speech recognition
2015influential reference
Grammar as a Foreign Language
2014cited by this paper
Towards End-To-End Speech Recognition with Recurrent Neural Networks
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Addressing the Rare Word Problem in Neural Machine Translation
2014cited by this paper
On Using Very Large Target Vocabulary for Neural Machine Translation
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
A Clockwork RNN
2014cited by this paper
First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs
2014cited by this paper
Learning Lexicons From Speech Using a Pronunciation Mixture Model
2013cited by this paper
Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition
2013cited by this paper
Japanese and Korean voice search
2012cited by this paper
Practical Variational Inference for Neural Networks
2011cited by this paper
The Kaldi Speech Recognition Toolkit
2011cited by this paper
Conference Paper
2009cited by this paper
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
2006cited by this paper
Automatic generation of subword units for speech recognition systems
2002cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Bidirectional recurrent neural networks
1997cited by this paper
Long Short-Term Memory
1997influential reference
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
1995cited by this paper

CITED BY

Green Entrepreneurship for Business Sustainability: Do Environmental Dynamism and Green Structural Capital Matter?
2024cites this paper
SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
2023cites this paper
End-to-End Speech Recognition: A Survey
2023cites this paper
Prefix-Level Detection and Autocorrection of Keyboard Input Errors
2023cites this paper
Syntactic Inductive Biases for Natural Language Processing
2022cites this paper
Analyze The Factors of Organizational Citizenship Behavior on Starbucks’ Manager in Surabaya
2021cites this paper
Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
2021cites this paper
Phonetically Induced Subwords for End-to-End Speech Recognition
2021cites this paper
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
2021cites this paper
Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning
2020cites this paper
Do End-to-End Speech Recognition Models Care About Context?
2020cites this paper
Automatic Segmented-Syllable and Deep Learning-Based Indonesian Audiovisual Speech Recognition
2020cites this paper
Optimizing Word Segmentation for Downstream Task
2020cites this paper
Imputer: Sequence Modelling via Imputation and Dynamic Programming
2020influential citation
Learning a Subword Inventory Jointly with End-to-End Automatic Speech Recognition
2020influential citation
Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
2020cites this paper
Differentiable Weighted Finite-State Transducers
2020cites this paper
A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models.
2020cites this paper
Interacting with Smart Devices - Advancements in Gesture Recognition and Augmented Reality
2020cites this paper
Data-driven deep modeling and training for automatic speech recognition
2020cites this paper
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
2019cites this paper
A comprehensive analysis on attention models
2019cites this paper
Language Model for Query Auto-Completion
2019cites this paper
Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition
2019cites this paper
Subword Language Model for Query Auto-Completion
2019cites this paper
An Overview of End-to-End Automatic Speech Recognition
2019cites this paper
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
2019cites this paper
Sequence-to-sequence learning for machine translation and automatic differentiation for machine learning software tools
2019cites this paper
Spoken command recognition for robotics
2019cites this paper
A Fully Differentiable Beam Search Decoder
2019cites this paper
Speech Recognition with Augmented Synthesized Speech
2019cites this paper
CompILE: Compositional Imitation Learning and Execution
2018cites this paper
Neural Lattice Language Models
2018influential citation
Advancing Acoustic-to-Word CTC Model
2018cites this paper
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
2018cites this paper
Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units
2018cites this paper
Acoustic-to-Word Recognition with Sequence-to-Sequence Models
2018influential citation
Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
2018cites this paper
Optimal Completion Distillation for Sequence Learning
2018cites this paper
Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes
2018cites this paper
Unsupervised Word Discovery with Segmental Neural Language Models
2018cites this paper
Compositional Imitation Learning: Explaining and executing one task at a time
2018cites this paper
Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units
2018cites this paper
Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set
2018cites this paper
Hierarchical Multitask Learning With CTC
2018cites this paper
Low-Frequency Character Clustering for End-to-End ASR System
2018cites this paper
Learning to Discover, Ground and Use Words with Segmental Neural Language Models
2018cites this paper
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
2017cites this paper
Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition
2017cites this paper
Subword and Crossword Units for CTC Acoustic Models
2017cites this paper
Improving the Performance of Online Neural Transducer Models
2017cites this paper
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
2017cites this paper
Simplified End-to-End MMI Training and Voting for ASR
2017cites this paper
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
2017cites this paper
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition
2017cites this paper
A Comparison of Sequence-to-Sequence Models for Speech Recognition
2017cites this paper
Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling
2017influential citation
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer
2017cites this paper
Regularizing Neural Networks by Penalizing Confident Output Distributions
2017influential citation
Multiscale sequence modeling with a learned dictionary
2017cites this paper
End-to-End Speech Recognition Models
2016cites this paper
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models
2016cites this paper