Sequence Transduction with Recurrent Neural Networks

Published 2012 in arXiv.org

ABSTRACT

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

PUBLICATION RECORD

Publication year
2012
Venue
arXiv.org
Publication date
2012-11-14
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1211.3711
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Generating Text with Recurrent Neural Networks
2011cited by this paper
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
2010influential reference
Recurrent neural network based language model
2010cited by this paper
Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks
2008cited by this paper
Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks
2007cited by this paper
Unconstrained Online Handwriting Recognition with Recurrent Neural Networks
2007cited by this paper
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
2006influential reference
Rejection strategies for offline handwritten text line recognition
2006cited by this paper
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures
2005cited by this paper
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001cited by this paper
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
2001cited by this paper
Long short-term memory in recurrent neural networks
2001cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Global training of document processing systems using graph transformer networks
1997cited by this paper
Bidirectional recurrent neural networks
1997cited by this paper
Long Short-Term Memory
1997cited by this paper
An analysis of noise in recurrent neural networks: convergence and generalization
1996influential reference
Gradient-based learning algorithms for recurrent networks and their computational complexity
1995cited by this paper
Speaker-independent phone recognition using hidden Markov models
1989cited by this paper

CITED BY

Do we really need Self-Attention for Streaming Automatic Speech Recognition?
2026cites this paper
ViSpeechFormer: A Phonemic Approach for Vietnamese Automatic Speech Recognition
2026cites this paper
Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text
2026cites this paper
Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
2026cites this paper
Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
2026cites this paper
A Unified Perspective on CTC and Soft-DTW Using Differentiable DTW
2026influential citation
Voxtral Realtime
2026cites this paper
VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications
2026cites this paper
NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference
2026cites this paper
Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition
2026cites this paper
823-OLT @ BUET DL Sprint 4.0: Context-Aware Windowing for ASR and Fine-Tuned Speaker Diarization in Bengali Long Form Audio
2026cites this paper
Suffix-Constrained Greedy Search Algorithms for Causal Language Models
2026cites this paper
Qwen3-ASR Technical Report
2026cites this paper
Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects
2026cites this paper
Leveraging Beam Search Information for Confidence Estimation in E2E ASR
2026cites this paper
Frame-Level Internal Tool Use for Temporal Grounding in Audio LMs
2026cites this paper
Fine-tuning Whisper for speech recognition in aquatic product inspection tasks
2026cites this paper
Efficient Sparse Selective-Update RNNs for Long-Range Sequence Modeling
2026cites this paper
Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications
2026cites this paper
Relaxing Positional Alignment in Masked Diffusion Language Models
2026cites this paper
Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition
2026cites this paper
IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition
2026cites this paper
Sylber 2.0: A Universal Syllable Embedding
2026influential citation
TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR
2026cites this paper
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
2026cites this paper
18.2 A 22nm 1.87ms/Frame Streaming Multi-Speaker ASR Accelerator Leveraging Contextual-Aware Redundancy Skipping with 2D-Writable Microscaling Compute-in-Memory and Similarity-Aware TCAM Design
2026cites this paper
Fast and General Automatic Differentiation for Finite-State Methods
2026cites this paper
Continual test-time dynamic speech recognition via adaptive threshold
2025cites this paper
Cautious Next Token Prediction
2025cites this paper
Energy-Guided Decoding for Object Hallucination Mitigation
2025cites this paper
A Survey of LLM Inference Systems
2025cites this paper
Early Attentive Sparsification Accelerates Neural Speech Transcription
2025cites this paper
Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges
2025cites this paper
Autoregressive Speech Enhancement via Acoustic Tokens
2025cites this paper
DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
2025cites this paper
BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models
2025cites this paper
Exploring Contextual Knowledge-Enhanced Speech Recognition in Air Traffic Control Communication: A Comparative Study
2025cites this paper
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
2025cites this paper
MFA-KWS: Effective Keyword Spotting With Multi-Head Frame-Asynchronous Decoding
2025cites this paper
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
2025cites this paper
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
2025cites this paper
Multi-Hypothesis Distillation of Multilingual Neural Translation Models for Low-Resource Languages
2025cites this paper
Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors
2025cites this paper
Encoder-Aware Sequence-Level Knowledge Distillation for Low-Resource Neural Machine Translation
2025cites this paper
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
2025cites this paper
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
2025cites this paper
A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling
2025cites this paper
DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation
2025cites this paper
PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems
2025influential citation
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining
2025cites this paper
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
2025cites this paper
Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach
2025cites this paper
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
2025cites this paper
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
2025cites this paper
Label-Context-Dependent Internal Language Model Estimation for CTC
2025cites this paper
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
2025cites this paper
Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR
2025cites this paper
Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data
2025cites this paper
ASR-Synchronized Speaker-Role Diarization
2025cites this paper
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
2025cites this paper
Pronunciation-Lexicon Free Training for Phoneme-Based Crosslingual ASR via Joint Stochastic Approximation
2025cites this paper
Mixture of LoRA Experts With Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
2025cites this paper
Forecasting Information Operations with Hybrid Transformer Architecture
2025cites this paper
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
2025cites this paper
WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection
2025cites this paper
Multi-modal Streaming ASR in Cross-talk Scenario for Smart Glasses
2025cites this paper
Self-Information Guided Speech Segmentation for Efficient Streaming ASR
2025cites this paper
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
2025cites this paper
Regarding the Existence of the Internal Language Model in CTC-Based E2E ASR
2025cites this paper
Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
2025cites this paper
Towards Bringing Parity in Pretraining Datasets for Low-resource Indian Languages
2025cites this paper
Toward Real-Time Recognition of Continuous Indian Sign Language: A Multi-Modal Approach Using RGB and Pose
2025cites this paper
Generating Long Semantic IDs in Parallel for Recommendation
2025cites this paper
Combining multilingual resources to enhance end-to-end speech recognition systems for Scandinavian languages
2025cites this paper
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
2025cites this paper
Training and Inference Efficiency of Encoder-Decoder Speech Models
2025cites this paper
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
2025cites this paper
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
2025cites this paper
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
2025cites this paper
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
2025cites this paper
Globally Normalizing the Transducer for Streaming Speech Recognition
2025influential citation
Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
2025cites this paper
Token-Level Contextual Network with Ladder-Shaped Attention for End-to-End ASR
2025cites this paper
Advancing Streaming ASR with Chunk-wise Attention and Trans-chunk Selective State Spaces
2025cites this paper
M-MoE: Mixture of Mixture-of-Expert Model for CTC-based Streaming Multilingual ASR
2025cites this paper
AMuSE: Attentive Multilingual Speech Encoding for Zero-Prior ASR
2025cites this paper
A streaming brain-to-voice neuroprosthesis to restore naturalistic communication
2025influential citation
Rate–Distortion–Perception Trade-Off in Information Theory, Generative Models, and Intelligent Communications
2025cites this paper
Unsupervised End-to-End Accented Speech Recognition Under Low-Resource Conditions
2025cites this paper
LassoRNet: Accurate dim-light melatonin onset time prediction from multiple blood tissue samples
2025influential citation
Evaluating Evaluation Metrics - The Mirage of Hallucination Detection
2025cites this paper
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
2025cites this paper
Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs
2025cites this paper
Speech-Based Phonetic Transcript Metrics
2025cites this paper
Differentiable K-means for Fully-optimized Discrete Token-based ASR
2025cites this paper
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
2025cites this paper
Improving endpoint detection in end-to-end streaming ASR for conversational speech
2025cites this paper
Optimal Policy Minimum Bayesian Risk
2025cites this paper
Large Language Models for Planning: A Comprehensive and Systematic Survey
2025cites this paper
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
2025cites this paper