Training Very Deep Networks

R. Srivastava,Klaus Greff,J. Schmidhuber

Published 2015 in Neural Information Processing Systems

ABSTRACT

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.

PUBLICATION RECORD

Publication year
2015
Venue
Neural Information Processing Systems
Publication date
2015-07-22
Fields of study
Computer Science
Identifiers
arXiv 1507.06228
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Highway Networks
2015cited by this paper
Binding via Reconstruction Clustering
2015cited by this paper
Grid Long Short-Term Memory
2015cited by this paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015cited by this paper
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2014cited by this paper
Deeply-Supervised Nets
2014cited by this paper
Caffe: Convolutional Architecture for Fast Feature Embedding
2014cited by this paper
On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures
2014cited by this paper
Spatially-sparse convolutional neural networks
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
FitNets: Hints for Thin Deep Nets
2014influential reference
On the Number of Linear Regions of Deep Neural Networks
2014cited by this paper
On the Expressive Efficiency of Sum Product Networks
2014cited by this paper
Striving for Simplicity: The All Convolutional Net
2014cited by this paper
Deep Networks with Internal Selective Attention through Feedback Connections
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Understanding Locally Competitive Networks
2014cited by this paper
Random Walk Initialization for Training Very Deep Feedforward Networks
2014cited by this paper
On the importance of initialization and momentum in deep learning
2013cited by this paper
Compete to Compute
2013cited by this paper
Generating Sequences With Recurrent Neural Networks
2013cited by this paper
Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks
2013cited by this paper
Maxout Networks
2013cited by this paper
Network In Network
2013cited by this paper
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
2013cited by this paper
Training Deep and Recurrent Networks with Hessian-Free Optimization
2012cited by this paper
Deep Learning Made Easier by Linear Transformations in Perceptrons
2012cited by this paper
Multi-column deep neural networks for image classification
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Flexible, High Performance Convolutional Neural Networks for Image Classification
2011cited by this paper
Understanding the difficulty of training deep feedforward neural networks
2010cited by this paper
Networks
2007influential reference
A Fast Learning Algorithm for Deep Belief Nets
2006cited by this paper
Learning to Forget: Continual Prediction with LSTM
2000cited by this paper
Long Short-Term Memory
1997cited by this paper
Bridging Long Time Lags by Weight Guessing and \Long Short Term Memory"
1996cited by this paper
Learning Complex, Extended Sequences Using the Principle of History Compression
1992cited by this paper
Untersuchungen zu dynamischen neuronalen Netzen
1991cited by this paper
On the power of small-depth threshold circuits
1990cited by this paper
Computational limitations of small-depth circuits
1987cited by this paper

CITED BY

MG-LDM: A multimodal guided latent diffusion model with Mamba-based temporal encoding for inverse topological design of tissue engineering skin substitutes
2026cites this paper
Fine-Grained Traceability for Transparent ML Pipelines
2026cites this paper
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices
2026cites this paper
LLM-Guided Knowledge Distillation for Temporal Knowledge Graph Reasoning
2026cites this paper
Lipschitz Multiscale Deep Equilibrium Models: A Theoretically Guaranteed and Accelerated Approach
2026cites this paper
Gradient Residual Connections
2026cites this paper
Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models
2026cites this paper
Enhanced financial market forecasting using a hybrid deep learning prediction model with encoder-decoder architecture
2026cites this paper
Intelligent fault detection in seismic data using U-shaped residual network-temporal-spatial attention mechanism with fourier forward-inverse transform constraints
2026cites this paper
A Hybrid Approach to Physical and Deep Learning Models for Radar-Based Precipitation Nowcasting
2025cites this paper
Hadamax Encoding: Elevating Performance in Model-Free Atari
2025cites this paper
TOFFNet: A Texture Orientation-based Feature Fusion Network for contactless multimodal finger recognition
2025cites this paper
DC-CLIP: Multilingual CLIP Compression via vision-language distillation and vision-language alignment
2025cites this paper
Refined linguistic deliberation for video captioning via cascade transformer and LSTM
2025cites this paper
A camouflage target classification method based on spectral difference enhancement and pixel-pair features in land-based hyperspectral images
2025cites this paper
Deep Learning and Signal Processing Integration in Automatic Speech Recognition Framework Using Hidden Markov Models
2025cites this paper
Accelerated Training through Iterative Gradient Propagation Along the Residual Path
2025cites this paper
Navigation of autonomous mobile robots in dynamic unknown environments based on dueling double deep q networks
2025cites this paper
Reconstruct Multiscale Features for Lightweight Small Object Detection in Remote Sensing Images
2025cites this paper
mHC: Manifold-Constrained Hyper-Connections
2025cites this paper
IResNets: Iterative Residual Neural Networks
2025cites this paper
Advancing video self-supervised learning via image foundation models
2025cites this paper
A Survey On Neural Network Quantization
2025cites this paper
Lookup Table-based Multiplication-free All-digital DNN Accelerator Featuring Self-Synchronous Pipeline Accumulation
2025cites this paper
DATC-STP: Towards Accurate yet Efficient Spatiotemporal Prediction With Transformer-Style CNN
2025cites this paper
Residual-time gated recurrent unit
2025cites this paper
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
2025cites this paper
High-to-low flow dynamics learning with deep convolutional encoder–decoder networks for randomly packed pebble-bed geometry
2025cites this paper
VARIUM: Variational Autoencoder for Multi-Interest Representation with Inter-User Memory
2025cites this paper
Improving Deep Random Vector Functional Link Networks through computational optimization of regularization parameters
2025cites this paper
ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning
2025cites this paper
ImprovDML: Improved Trade-off in Private Byzantine-Resilient Distributed Machine Learning
2025cites this paper
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation
2025cites this paper
An Additively Preconditioned Trust Region Strategy for Machine Learning
2025cites this paper
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
2025cites this paper
GRAF-IDS: graph-based clustering as aggregation for federated intrusion detection system in IoT network
2025cites this paper
Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations
2025cites this paper
A dual-regressor adversarial framework with staged-wise training strategy for cross-domain remaining useful life prediction
2025cites this paper
Compactly Hardware Implementation of a High-Speed Activation Function Based on Phase-Change Memory in Neural Network
2025cites this paper
Comparative Analysis of Brain Tumor Classification Using DNN
2025cites this paper
A survey of IPv6 address scanning techniques
2025cites this paper
Explainable deeply-fused nets electricity demand prediction model: Factoring climate predictors for accuracy and deeper insights with probabilistic confidence interval and point-based forecasts
2025cites this paper
PNN: A Novel Progressive Neural Network for Fault Classification in Rotating Machinery under Small Dataset Constraint
2025cites this paper
Hadamard Product in Deep Learning: Introduction, Advances and Challenges
2025cites this paper
Deep learning inference of miRNA expression from bulk and single-cell mRNA expression
2025cites this paper
A Global Spatial-Temporal Attention method based on motion decomposition for precipitation nowcasting
2025cites this paper
Optimal carbon emission path under uncertainty: Physical risks and transition risks
2025cites this paper
StO2 Stress Recognition Based on Deformable Convolution and Multimodal Feature Blender
2025cites this paper
Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery
2025cites this paper
Incorporating Deep Learning Into Hydrogeological Modeling: Advancements, Challenges, and Future Directions
2025cites this paper
The Use of Long Short-Term Memory Models to Estimate Soybean Pricing: A Regional Climate Data Evaluation From Brazil
2025cites this paper
A new feed-forward deep neural network: mobile dense neural network
2025cites this paper
GCCNet: A Novel Network Leveraging Gated Cross-Correlation for Multi-View Classification
2025cites this paper
Masked hybrid attention with Laplacian query fusion and tripartite sequence matching for medical image segmentation
2025cites this paper
Adaptive Complex Wavelet Informed Transformer Operator
2025cites this paper
Adaptive physics-informed CNN based thermal performance reliability index estimation for directional heat transfer C/C composite structure in UAVs
2025cites this paper
Flopping for FLOPs: Leveraging equivariance for computational efficiency
2025cites this paper
DeepCrossAttention: Supercharging Transformer Residual Connections
2025cites this paper
Character-Level Encoding based Neural Machine Translation for Hindi language
2025cites this paper
AlexCapsNet: An Integrated Architecture for Image Classification With Background Noise
2025cites this paper
Beyond Data Augmentations: Generalization Abilities of Few-Shot Segmentation Models
2025cites this paper
ADR-SALD: Attention-Based Deep Residual Sign Agnostic Learning With Derivatives for Implicit Surface Reconstruction
2025cites this paper
Hybrid loss-based convolutional encoder–decoder neural networks for the robust topology optimization of porous heat exchangers
2025cites this paper
Geometric Reconstruction in Modern Remote Sensing Techniques and Applications
2024cites this paper
Point HorNet: Higher-Order Spatial Interaction Network for Point Clouds
2024cites this paper
Weakly Contrastive Learning via Batch Instance Discrimination and Feature Clustering for Small Sample SAR ATR
2024cites this paper
A Shortcut Enhanced LSTM-GCN Network for Multi-Sensor Based Human Motion Tracking
2024cites this paper
Depth cue fusion for event-based stereo depth estimation
2024cites this paper
Anomaly-Free Prior Guided Knowledge Distillation for Industrial Anomaly Detection
2024cites this paper
Sky-Image-Based Sun-Blocking Index and PredRNN++ for Accurate Short-Term Solar Irradiance Forecasting
2024cites this paper
LAuReL: Learned Augmented Residual Layer
2024cites this paper
Residual connections improve click-through rate and conversion rate prediction performance
2024cites this paper
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
2024cites this paper
Toward Compact and Robust Model Learning Under Dynamically Perturbed Environments
2024cites this paper
Extended Features Based Random Vector Functional Link Network for Classification Problem
2024cites this paper
A simulated two-stream network via multilevel distillation of reviewed features and decoupled logits for video action recognition
2024cites this paper
TR-Net: Token Relation Inspired Table Filling Network for Joint Entity and Relation Extraction
2024cites this paper
Rapid prediction of mechanical properties during composite curing using artificial neural network and multi-objective genetic algorithms
2024cites this paper
EFAM-Net: A Multi-Class Skin Lesion Classification Model Utilizing Enhanced Feature Fusion and Attention Mechanisms
2024cites this paper
Deep Learning Frontiers in 3D Object Detection: A Comprehensive Review for Autonomous Driving
2024cites this paper
Augmenting DenseNet: Leveraging Multi-Scale Skip Connections for Effective Early-Layer Information Incorporation
2024cites this paper
VulCatch: Enhancing Binary Vulnerability Detection through CodeT5 Decompilation and KAN Advanced Feature Extraction
2024cites this paper
Highway Networks for Improved Surface Reconstruction: The Role of Residuals and Weight Updates
2024influential citation
DDSNet: Deep Dual-Branch Networks for Surface Defect Segmentation
2024cites this paper
VisionNet: An efficient vision transformer-based hybrid adaptive networks for eye cancer detection with enhanced cheetah optimizer
2024cites this paper
A Seesaw Model Attack Algorithm for Distributed Learning
2024cites this paper
GlobalTomo: A global dataset for physics-ML seismic wavefield modeling and FWI
2024cites this paper
BEAVP: A Bidirectional Enhanced Adversarial Model for Video Prediction
2024cites this paper
The Role of Temporal Hierarchy in Spiking Neural Networks
2024cites this paper
Real-Time Anomaly Detection in Smart Grid Networks Using Deep Learning with Cross-Domain Generalization and Multi-Task Learning
2024cites this paper
Photonics-aided D-band 64-QAM MMW transmission utilizing modified multi-symbol output neural network equalization
2024cites this paper
Review on Deep Learning Network Architectures for Image Reconstruction
2024cites this paper
Dynamical Targeted Ensemble Learning for Streaming Data With Concept Drift
2024cites this paper
Medical Image Segmentation of Liver Tumors with Multi-phase Deficiency Based on Hierarchical Knowledge Distillation Network
2024cites this paper
Strengthening Layer Interaction via Dynamic Layer Attention
2024cites this paper
Stacking algorithm based on naive Bayes
2024cites this paper
FACTS: A Factored State-Space Framework For World Modelling
2024cites this paper
An efficient and real-time steel surface defect detection method based on single-stage detection algorithm
2024cites this paper
Hadamard Representations: Augmenting Hyperbolic Tangents in RL
2024cites this paper
Neural Residual Diffusion Models for Deep Scalable Vision Generation
2024cites this paper