A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Published 2015 in arXiv.org

ABSTRACT

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

PUBLICATION RECORD

Publication year
2015
Venue
arXiv.org
Publication date
2015-04-03
Fields of study
Computer Science
Identifiers
arXiv 1504.00941
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Grammar as a Foreign Language
2014cited by this paper
Towards End-To-End Speech Recognition with Recurrent Neural Networks
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Addressing the Rare Word Problem in Neural Machine Translation
2014cited by this paper
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Learning to Execute
2014cited by this paper
Random Walk Initialization for Training Very Deep Feedforward Networks
2014cited by this paper
RECURRENT NEURAL NETWORKS
2014cited by this paper
Exploring Deep Learning Methods for Discovering Features in Speech Signals
2014cited by this paper
Hybrid speech recognition with Deep Bidirectional LSTM
2013cited by this paper
On the importance of initialization and momentum in deep learning
2013cited by this paper
Speech recognition with deep recurrent neural networks
2013cited by this paper
One billion word benchmark for measuring progress in statistical language modeling
2013cited by this paper
Parsing with Compositional Vector Grammars
2013cited by this paper
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
2013cited by this paper
Generating Sequences With Recurrent Neural Networks
2013cited by this paper
On rectified linear units for speech processing
2013cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Large Scale Distributed Deep Networks
2012cited by this paper
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
The Kaldi Speech Recognition Toolkit
2011cited by this paper
Building high-level features using large scale unsupervised learning
2011cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Generating Text with Recurrent Neural Networks
2011cited by this paper
Learning Recurrent Neural Networks with Hessian-Free Optimization
2011influential reference
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
Deep learning via Hessian-free optimization
2010cited by this paper
A Novel Connectionist System for Unconstrained Handwriting Recognition
2009cited by this paper
Learning Precise Timing with LSTM Recurrent Networks
2003cited by this paper
Neural Networks: Tricks of the Trade
2002cited by this paper
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
2001cited by this paper
Learning to Forget: Continual Prediction with LSTM
2000cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Long Short-Term Memory
1997cited by this paper
Learning representations by back-propagation errors, nature
1986cited by this paper
Learning representations by back-propagating errors
1986cited by this paper

CITED BY

Neural network optimization strategies and the topography of the loss landscape
2026cites this paper
Parallelizable memory recurrent units
2026cites this paper
Efficient Sparse Selective-Update RNNs for Long-Range Sequence Modeling
2026cites this paper
Pulse-Driven Neural Architecture: Learnable Oscillatory Dynamics for Robust Continuous-Time Sequence Processing
2026cites this paper
Corticofugal gated recurrency captures auditory cortical responses
2025cites this paper
Parallel Delayed Memory Units for Enhanced Temporal Modeling in Biomedical and Bioacoustic Signal Analysis
2025cites this paper
Efficient and robust temporal processing with neural oscillations modulated spiking neural networks
2025influential citation
Learning place cells and remapping by decoding the cognitive map
2025cites this paper
Monitoring and Predicting Deformation in Deep Excavation: A Comparative Study of Supervised Learning Methods
2025cites this paper
Assisting Training of Deep Spiking Neural Networks With Parameter Initialization
2025cites this paper
Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion
2025cites this paper
Theoretical Convergence Analysis and Initialization Comparisons of Deep Soft-Thresholding Networks
2025cites this paper
Hybrid Quantum Neural Networks with Amplitude Encoding: Advancing Recovery Rate Predictions
2025cites this paper
On-Chip Implementation of Neural Network-Based Classifier Models for E-Nose With Chemometric Analysis
2025cites this paper
Dual excitation-switchable attention for deep video compression
2025cites this paper
Low-Bit Data Processing Using Multiple-Output Spiking Neurons With Non-Linear Reset Feedback
2025cites this paper
A Survey of Deep Learning for Complex Speech Spectrograms
2025cites this paper
Prediction of Macroeconomic indicators in China's Market based on Traditional Time Series model and LSTM Model
2025cites this paper
Weight-Space Linear Recurrent Neural Networks
2025cites this paper
DCAlexNet: Deep coupled AlexNet for micro facial expression recognition based on double face images
2025cites this paper
Hardware/Software Co-Design Optimization for Training Recurrent Neural Networks at the Edge
2025cites this paper
Hybrid Quantum-Classical Recurrent Neural Networks
2025cites this paper
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling
2025cites this paper
IDInit: A Universal and Stable Initialization Method for Neural Network Training
2025cites this paper
A dataset of differentiable biologically-derived single neuron models
2025cites this paper
Application of a Hybrid CNN-LSTM Model for Groundwater Level Forecasting in Arid Regions: A Case Study from the Tailan River Basin
2025cites this paper
Can Local Representation Alignment RNNs Solve Temporal Tasks?
2025cites this paper
A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization
2025cites this paper
EchoLSTM: A Self-Reflective Recurrent Network for Stabilizing Long-Range Memory
2025cites this paper
Exploring Supervised LSTM Model on Multimodal Data to Detect Distracted Students in Immersive Educational VR Environments
2025cites this paper
End-to-End Hybrid Stock Market Prediction: Transformer-Based Enhanced Recurrent Neural Networks with FinBERT Integration
2025cites this paper
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
2025cites this paper
Credit Assignment via Behavioral Timescale Synaptic Plasticity: Theoretical Frameworks
2025cites this paper
Side-scan sonar image denoising algorithm based on deep learning and compressed sensing
2025cites this paper
Enhancing temporal learning in recurrent spiking networks for neuromorphic applications
2025cites this paper
Revisiting Glorot Initialization for Long-Range Linear Recurrences
2025cites this paper
Dale meets Langevin: A Multiplicative Denoising Diffusion Model
2025cites this paper
PDRNN: Modular Data-driven Pedestrian Dead Reckoning on Loosely Coupled Radio- and Inertial-Signalstreams
2025cites this paper
Learnability Window in Gated Recurrent Neural Networks
2025cites this paper
Self-Supervised Grid Cells Without Path Integration
2025cites this paper
Optimizing the Output of Long Short-Term Memory Cell for High-Frequency Forecasting in Financial Markets
2025cites this paper
FedCFC: On-Device Personalized Federated Learning with Closed-Form Continuous-Time Neural Networks
2024cites this paper
Mamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processing
2024cites this paper
Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion
2024cites this paper
The Role of Temporal Hierarchy in Spiking Neural Networks
2024cites this paper
Learning Place Cell Representations and Context-Dependent Remapping
2024cites this paper
Multiperson Activity Recognition and Tracking Based on Skeletal Keypoint Detection
2024cites this paper
Power-Efficient, Accelerated, Exponential-Based Activation Functions
2024cites this paper
Unitary convolutions for learning on graphs and groups
2024cites this paper
A Survey on Kolmogorov-Arnold Network
2024cites this paper
Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models
2024cites this paper
SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning
2024cites this paper
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
2024cites this paper
Oja's plasticity rule overcomes several challenges of training neural networks under biological constraints
2024cites this paper
Chrono Initialized LSTM Networks With Layer Normalization
2024cites this paper
Real-time AIoT anomaly detection for industrial diesel generator based an efficient deep learning CNN-LSTM in industry 4.0
2024cites this paper
Unveiling the secrets of new physics through top quark tagging
2024cites this paper
Deep State Space Recurrent Neural Networks for Time Series Forecasting
2024cites this paper
P-Spikessm: Harnessing Probabilistic Spiking State Space Models for Long-Range Dependency Tasks
2024cites this paper
ETTFS: An Efficient Training Framework for Time-to-First-Spike Neuron
2024cites this paper
Geometric sparsification in recurrent neural networks
2024cites this paper
Price Forecasting in the Ontario Electricity Market via TriConvGRU Hybrid Model: Univariate vs. Multivariate Frameworks
2024cites this paper
Rapid context inference in a thalamocortical model using recurrent neural networks
2024cites this paper
Coprocessor for Multi-Mode High Precision Nonlinear Activation Function
2024cites this paper
Stratified Sampling Algorithms for Machine Learning Methods in Solving Two-scale Partial Differential Equations
2024cites this paper
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
2024cites this paper
On the dynamics of convolutional recurrent neural networks near their critical point
2024cites this paper
TKAN: Temporal Kolmogorov-Arnold Networks
2024cites this paper
Graph Expansion in Pruned Recurrent Neural Network Layers Preserve Performance
2024cites this paper
Mapping the Invisible: Face-GPS for Facial Muscle Dynamics in Videos
2024cites this paper
Predictive Modulation with an LSTM-RNN Framework for Voice-Driven Threat Recognition
2024cites this paper
Learning Conjunctive Representations
2024cites this paper
Research on Deformation Prediction of VMD-GRU Deep Foundation Pit Based on PSO Optimization Parameters
2024cites this paper
The Ghanaian NLP Landscape: A First Look
2024cites this paper
Data Augmentation Techniques for Accurate Action Classification in Stroke Patients with Hemiparesis
2024cites this paper
Efficient Online Learning for Networks of Two-Compartment Spiking Neurons
2024cites this paper
Liquid Resistance Liquid Capacitance Networks
2024cites this paper
A review of small object detection based on deep learning
2024cites this paper
Universal Neural Functionals
2024cites this paper
An Augmented Lagrangian Method for Training Recurrent Neural Networks
2024cites this paper
Unconditional stability of a recurrent neural circuit implementing divisive normalization
2024cites this paper
Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering Tasks
2024cites this paper
Bayesian Optimization Based Neural Architecture Search for Classification of Gases/Odors Mixtures
2024cites this paper
E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning
2024cites this paper
LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre Memory Units
2024cites this paper
Face-GPS: A Comprehensive Technique for Quantifying Facial Muscle Dynamics in Videos
2024cites this paper
Fast deep learning with tight frame wavelets
2023cites this paper
SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting
2023cites this paper
Actual shear rate prediction associated with wall slip phenomenon using radial basis function network
2023cites this paper
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
2023cites this paper
NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications
2023cites this paper
Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
2023cites this paper
Delayed Memory Unit: Modeling Temporal Dependency Through Delay Gate
2023cites this paper
Expanding memory in recurrent spiking networks
2023cites this paper
Stability-Informed Initialization of Neural Ordinary Differential Equations
2023cites this paper
Investigating Activity Recognition for Hemiparetic Stroke Patients Using Wearable Sensors: A Deep Learning Approach with Data Augmentation
2023influential citation
Well On/Off Time Classification Using Recurrent Neural Networks and a Developed Transient Well Simulator
2023cites this paper
Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion
2023cites this paper
Contraction Properties of the Global Workspace Primitive
2023cites this paper
Vision-Text Cross-Modal Fusion for Accurate Video Captioning
2023cites this paper