Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

Linnan Wang,Yi Yang,Martin Renqiang Min,S. Chakradhar

Published 2016 in Neural Networks

ABSTRACT

Stochastic Gradient Descent (SGD) updates Convolutional Neural Network (CNN) with a noisy gradient computed from a random batch, and each batch evenly updates the network once in an epoch. This model applies the same training effort to each batch, but it overlooks the fact that the gradient variance, induced by Sampling Bias and Intrinsic Image Difference, renders different training dynamics on batches. In this paper, we develop a new training strategy for SGD, referred to as Inconsistent Stochastic Gradient Descent (ISGD) to address this problem. The core concept of ISGD is the inconsistent training, which dynamically adjusts the training effort w.r.t the loss. ISGD models the training as a stochastic process that gradually reduces down the mean of batch's loss, and it utilizes a dynamic upper control limit to identify a large loss batch on the fly. ISGD stays on the identified batch to accelerate the training with additional gradient updates, and it also has a constraint to penalize drastic parameter changes. ISGD is straightforward, computationally efficient and without requiring auxiliary memories. A series of empirical evaluations on real world datasets and networks demonstrate the promising performance of inconsistent training.

PUBLICATION RECORD

Publication year
2016
Venue
Neural Networks
Publication date
2016-03-17
Fields of study
Medicine, Computer Science, Mathematics
Identifiers
DOI 10.1016/j.neunet.2017.06.003 arXiv 1603.05544 PMID 28668660
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Probability theory
2020cited by this paper
Stochastic Variance Reduction for Nonconvex Optimization
2016cited by this paper
Efficient Communications in Training Large Scale Neural Networks
2016cited by this paper
Large Scale Artificial Neural Network Training Using Multi-GPUs
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
2015cited by this paper
On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
2015cited by this paper
Discriminative Learning of Deep Convolutional Feature Point Descriptors
2015cited by this paper
MALT: distributed data-parallelism for existing ML applications
2015cited by this paper
Efficient mini-batch training for stochastic optimization
2014cited by this paper
Caffe: Convolutional Architecture for Fast Feature Embedding
2014influential reference
Learning Deep Features for Scene Recognition using Places Database
2014cited by this paper
Long-term recurrent convolutional networks for visual recognition and description
2014cited by this paper
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization
2014cited by this paper
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
2013cited by this paper
Deep learning with COTS HPC systems
2013cited by this paper
Petuum: A New Platform for Distributed Machine Learning on Big Data
2013cited by this paper
On the importance of initialization and momentum in deep learning
2013cited by this paper
Parallel matrix factorization for recommender systems
2013influential reference
Variance Reduction for Stochastic Gradient Optimization
2013cited by this paper
Training Recurrent Neural Networks
2013cited by this paper
Sample size selection in optimization methods for machine learning
2012cited by this paper
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
2012cited by this paper
Large Scale Distributed Deep Networks
2012cited by this paper
On the training of recurrent neural networks
2011influential reference
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Torch7: A Matlab-like Environment for Machine Learning
2011cited by this paper
Optimal Distributed Online Prediction Using Mini-Batches
2010cited by this paper
Convolutional Deep Belief Networks on CIFAR-10
2010cited by this paper
Distributed Asynchronous Online Learning for Natural Language Processing
2010cited by this paper
Large-Scale Machine Learning with Stochastic Gradient Descent
2010cited by this paper
Two-tree algorithms for full bandwidth broadcast, reduction and scan
2009cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009influential reference
ImageNet: A large-scale hierarchical image database
2009cited by this paper
A Tutorial on the Cross-Entropy Method
2005cited by this paper
The mnist database of handwritten digits
2005cited by this paper
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation
2004cited by this paper
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16
2004cited by this paper
Large Scale Online Learning
2003influential reference
Neural Networks: Tricks of the Trade
2002cited by this paper
Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering
2002cited by this paper
On the momentum term in gradient descent learning algorithms
1999influential reference
Gradient-based learning applied to document recognition
1998influential reference
An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule
1998cited by this paper
Online Learning and Stochastic Approximations
1998cited by this paper
A Simple Weight Decay Can Improve Generalization
1991influential reference
A Stochastic Approximation Method
1951influential reference

CITED BY

Enhancing CT image segmentation accuracy through ensemble loss function optimization
2025cites this paper
Advancing debris flow detection based on deep learning model and high-resolution images
2025cites this paper
Intelligent Hyperparameter Optimization of Convolutional Neural Networks for Robust Multimodal Classification
2025cites this paper
Exploring optimizer efficiency for facial expression recognition with convolutional neural networks
2025cites this paper
High-speed tunable generation of random number distributions using actuated perpendicular magnetic tunnel junctions
2025cites this paper
Batch-FPM: Random Batch-Update Multi-Parameter Physical Fourier Ptychography Neural Network
2024cites this paper
Deep Learning-Driven Prediction of Mechanical Properties of 316L Stainless Steel Metallographic by Laser Powder Bed Fusion
2024cites this paper
Adaptive Stochastic Gradient Descent (SGD) for erratic datasets
2024cites this paper
Deep learning models integrating multi-sensor and -temporal remote sensing to monitor landslide traces in Vietnam
2024cites this paper
Advancing e-commerce user purchase prediction: Integration of time-series attention with event-based timestamp encoding and Graph Neural Network-Enhanced user profiling
2024cites this paper
Driver Drowsiness Detection Using Vision Transformer
2024cites this paper
Multiple importance sampling for stochastic gradient estimation
2024cites this paper
PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
2023cites this paper
Prediction of Heart Disease Based on Robust Artificial Intelligence Techniques
2023cites this paper
Online Importance Sampling for Stochastic Gradient Optimization
2023cites this paper
Adaptive learning nonsynchronous control of nonlinear hidden Markov jump systems with limited mode information
2023cites this paper
Training Artificial Neural Networks Using a Global Optimization Method That Utilizes Neural Networks
2023cites this paper
Mitigating the Burden of Redundant Datasets via Batch-Wise Unique Samples and Frequency-Aware Losses
2023cites this paper
NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
2023cites this paper
Convolutional neural network-based quantitative structure-activity relationship and fingerprint analysis against inhibitors of anthrax lethal factor.
2023cites this paper
Importance Sampling for Stochastic Gradient Descent in Deep Neural Networks
2023influential citation
Coastal landscape classification using convolutional neural network and remote sensing data in Vietnam.
2023cites this paper
Rolling bearing fault diagnosis algorithm using overlapping group sparse-deep complex convolutional neural network
2022cites this paper
Deep Learning Model Development for Detecting Coffee Tree Changes Based on Sentinel-2 Imagery in Vietnam
2022cites this paper
The digital asset value and currency supervision under deep learning and blockchain technology
2022cites this paper
An Effective Forest Fire Detection Framework Using Heterogeneous Wireless Multimedia Sensor Networks
2022cites this paper
Removal of Autogenous Fat Filling in Double Eyelid Operation by Artificial Intelligence (AI) Algorithm-Based Computerized Tomography (CT) Image Features
2022cites this paper
Vectorial surrogate modeling approach for multi-failure correlated probabilistic evaluation of turbine rotor
2022cites this paper
Why Is Everyone Training Very Deep Neural Network With Skip Connections?
2022cites this paper
Multivariate Statistical Analysis for Training Process Optimization in Neural Networks-Based Forecasting Models
2021cites this paper
Fractional-order gradient descent with momentum for RBF neural network-based AIS trajectory restoration
2021cites this paper
An improved residual-based convolutional neural network for very short-term wind power forecasting
2021cites this paper
Research on Prediction of Investment Fund's Performance before and after Investment Based on Improved Neural Network Algorithm
2021cites this paper
Novel and versatile artificial intelligence algorithms for investigating possible GHSR1α and DRD1 agonists for Alzheimer's disease
2021cites this paper
Low light image enhancement based on modified Retinex optimized by fractional order gradient descent with momentum RBF neural network
2021cites this paper
Precise whole liver automatic segmentation and quantification of PDFF and R2* on MR images
2021cites this paper
Machine learning in orthodontics: Challenges and perspectives.
2021cites this paper
Large Batch Experience Replay
2021influential citation
Deep Learning and Autoregressive Approach for Prediction of Time Series Data
2021cites this paper
Edge Learning
2021cites this paper
Rolling Bearing Fault Diagnosis Algorithm Based on Overlapping Group Sparse Model-Deep Complex Convolutional Neural Network
2021cites this paper
Determining the invasiveness of ground-glass nodules using a 3D multi-task network
2021cites this paper
A Nonlinear Gradient Domain-Guided Filter Optimized by Fractional-Order Gradient Descent with Momentum RBF Neural Network for Ship Image Dehazing
2021cites this paper
Advances in Deep Learning through Gradient Amplification and Applications
2020cites this paper
Integrating Artificial and Human Intelligence: A Partnership for Responsible Innovation in Biomedical Engineering and Medicine
2020cites this paper
Convolutional neural network approach for automatic tympanic membrane detection and classification
2020cites this paper
A Convolutional Neural Network for Coastal Classification Based on ALOS and NOAA Satellite Data
2020cites this paper
Accelerating Sparse Recovery by Reducing Chatter
2020cites this paper
A Novel Learning Rate Function and Its Application on the SVD++ Recommendation Algorithm
2020cites this paper
A Weight Initialization Method Associated with Samples for Deep Feedforward Neural Network
2020cites this paper
Neural Network Retraining for Model Serving
2020cites this paper
Accurate classification of cherry fruit using deep CNN based on hybrid pooling approach
2020cites this paper
An AI-based intelligent system for healthcare analysis using Ridge-Adaline Stochastic Gradient Descent Classifier
2020cites this paper
Stochastic recurrent wavelet neural network with EEMD method on energy price prediction
2020cites this paper
Block-term tensor neural networks
2020cites this paper
FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks
2020influential citation
Coastal Wetland Classification with Deep U-Net Convolutional Networks and Sentinel-2 Imagery: A Case Study at the Tien Yen Estuary of Vietnam
2020cites this paper
U-Net Convolutional Networks for Mining Land Cover Classification Based on High-Resolution UAV Imagery
2020cites this paper
Forcasting of energy futures market and synchronization based on stochastic gated recurrent unit model
2020cites this paper
Data classification based on fractional order gradient descent with momentum for RBF neural network
2020cites this paper
Electricity Load Forecasting via ANN Approach in Turkish Electricity Markets
2020cites this paper
Machine learning models and cost-sensitive decision trees for bond rating prediction
2019cites this paper
Gradient-based optimal control of open quantum systems using quantum trajectories and automatic differentiation
2019cites this paper
Analysis and Improvement for Fingerprinting-Based Localization Algorithm Based on Neural Network
2019cites this paper
Machine learning facilitated business intelligence (Part I)
2019cites this paper
Machine learning facilitated business intelligence (Part II)
2019cites this paper
Fast Deep Learning Training through Intelligently Freezing Layers
2019cites this paper
Hierarchical attributes learning for pedestrian re-identification via parallel stochastic gradient descent combined with momentum correction and adaptive learning rate
2019cites this paper
Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning
2019cites this paper
Short-term free parking berths prediction based on multitask – DBN neural network
2019cites this paper
Residual convolutional neural network for predicting response of transarterial chemoembolization in hepatocellular carcinoma from CT imaging
2019cites this paper
A cascaded dual-pathway residual network for lung nodule segmentation in CT images.
2019cites this paper
Dual-branch residual network for lung nodule segmentation
2019cites this paper
SketchDLC
2019cites this paper
Not All Samples Are Created Equal: Deep Learning with Importance Sampling
2018cites this paper
Application of Doc2vec and Stochastic Gradient Descent algorithms for Text Categorization
2018cites this paper
Pedestrian Re-identification Based on Hierarchical Attributes Learning via Parallel Stochastic Gradient Descent
2018cites this paper
An Efficient Neural Network with Performance-Based Switching of Candidate Optimizers for Point Cloud Matching
2018cites this paper
SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks
2018influential citation
Intelligent Data Engineering and Automated Learning – IDEAL 2018: 19th International Conference, Madrid, Spain, November 21–23, 2018, Proceedings, Part II
2018cites this paper
Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent
2018cites this paper
Compositional Stochastic Average Gradient for Machine Learning and Related Applications
2018cites this paper
Learning a convolutional neural network for propagation-based stereo image segmentation
2018cites this paper
Superneurons: dynamic GPU memory management for training deep neural networks
2018cites this paper
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
2017influential citation
Radial effect in stochastic diagonal approximate greatest descent
2017cites this paper
Biased Importance Sampling for Deep Neural Network Training
2017cites this paper
Hyperspectral Image Superresolution by Transfer Learning
2017cites this paper
Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets
2017cites this paper
Convolutional Neural Networks for Comprehending Geographical Features of International Important Ramsar Wetland Ecological Habitat Scenes in China
2017cites this paper
A Multi-Candidate Electronic Voting Scheme with Unlimited Participants
2017cites this paper
Efficient Communications in Training Large Scale Neural Networks
2016cites this paper
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
2015cites this paper
Sentiment Analysis of Chinese Weibo Trending Topics based on the BERT Model
year unknowncites this paper
Training Artiﬁcial Neural Networks Using a Global Optimization Method That Utilizes Neural Networks
year unknowncites this paper