Local Critic Training for Model-Parallel Learning of Deep Neural Networks

Published 2021 in IEEE Transactions on Neural Networks and Learning Systems

ABSTRACT

In this article, we propose a novel model-parallel learning method, called local critic training, which trains neural networks using additional modules called local critic networks. The main network is divided into several layer groups, and each layer group is updated through error gradients estimated by the corresponding local critic network. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, we demonstrate that the proposed method is guaranteed to converge to a critical point. We also show that trained networks by the proposed method can be used for structural optimization. Experimental results show that our method achieves satisfactory performance, reduces training time greatly, and decreases memory consumption per machine. Code is available at https://github.com/hjdw2/Local-critic-training.

PUBLICATION RECORD

Publication year
2021
Venue
IEEE Transactions on Neural Networks and Learning Systems
Publication date
2021-02-03
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1109/TNNLS.2021.3057380 arXiv 2102.01963 PMID 33606645
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The minimal exponent and k-rationality for local complete intersections
2022cited by this paper
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
2020cited by this paper
and s
2019cited by this paper
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
2019cited by this paper
Speech Recognition Using Deep Neural Networks: A Systematic Review
2019cited by this paper
Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking
2019cited by this paper
PipeDream: generalized pipeline parallelism for DNN training
2019cited by this paper
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
2019cited by this paper
Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol
2019cited by this paper
Measuring the Effects of Data Parallelism on Neural Network Training
2018cited by this paper
Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning
2018cited by this paper
Demystifying Parallel and Distributed Deep Learning
2018cited by this paper
Beyond Data and Model Parallelism for Deep Neural Networks
2018cited by this paper
Training Neural Networks Using Features Replay
2018cited by this paper
Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks
2018cited by this paper
Data-parallel distributed training of very large models beyond GPU capacity
2018cited by this paper
Mesh-TensorFlow: Deep Learning for Supercomputers
2018cited by this paper
Local Critic Training of Deep Neural Networks
2018cited by this paper
Machine translation using deep learning: An overview
2017cited by this paper
Mixed Precision Training
2017cited by this paper
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
2017cited by this paper
Multi-Scale Dense Networks for Resource Efficient Image Classification
2017cited by this paper
AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks
2017cited by this paper
Sobolev Training for Neural Networks
2017influential reference
Image recognition method based on deep learning
2017cited by this paper
Top Downloads in IEEE Xplore [Reader's Choice]
2017cited by this paper
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
2017cited by this paper
YOLO9000: Better, Faster, Stronger
2016cited by this paper
Neural Architecture Search with Reinforcement Learning
2016cited by this paper
AdaNet: Adaptive Structural Learning of Artificial Neural Networks
2016cited by this paper
Training Neural Networks Without Gradients: A Scalable ADMM Approach
2016influential reference
Decoupled Neural Interfaces using Synthetic Gradients
2016influential reference
An Integrated Model
2016cited by this paper
Optimization Methods for Large-Scale Machine Learning
2016cited by this paper
FractalNet: Ultra-Deep Neural Networks without Residuals
2016cited by this paper
BranchyNet: Fast inference via early exiting from deep neural networks
2016cited by this paper
Deep Neural Networks in Machine Translation: An Overview
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Learning the Structure of Deep Convolutional Networks
2015cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Distributed optimization of deeply nested systems
2012cited by this paper
I and J
2012cited by this paper
Large Scale Distributed Deep Networks
2012cited by this paper
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
2011cited by this paper
Parallelized Stochastic Gradient Descent
2010influential reference
Learning Multiple Layers of Features from Tiny Images
2009influential reference
Influence of cultivation temperature on the ligninolytic activity of selected fungal strains
2006cited by this paper
Long Short-Term Memory
1997cited by this paper
Constructive algorithms for structure learning in feedforward neural networks for regression problems
1997cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
Book Reviews
1893cited by this paper
AND T
year unknowninfluential reference

CITED BY

A survey on closed-loop intelligent frameworks for parallel training of deep neural networks
2025cites this paper
Supervised Local Training With Backward Links for Deep Neural Networks
2024cites this paper
Decentralized adaptive tracking control for interconnected nonlinear systems with unmodeled dynamics and input delays
2024cites this paper
Network Fission Ensembles for Low-Cost Self-Ensembles
2024cites this paper
Study on Thermal Error Modeling for CNC Machine Tools Based on the Improved Radial Basis Function Neural Network
2023cites this paper
Marine Current Turbine Multifault Diagnosis Based on Optimization Resampled Modulus Feature and 1-D-CNN
2023cites this paper
ADMM Algorithms for Residual Network Training: Convergence Analysis and Parallel Implementation
2023cites this paper
Fitness and Distance Based Local Search With Adaptive Differential Evolution for Multimodal Optimization Problems
2023cites this paper
LiPar: A Lightweight Parallel Learning Model for Practical In-Vehicle Network Intrusion Detection
2023cites this paper
Trinity: Neural Network Adaptive Distributed Parallel Training Method Based on Reinforcement Learning
2022cites this paper
BackLink: Supervised Local Training with Backward Links
2022cites this paper
Efficient Neuromorphic Hardware Through Spiking Temporal Online Local Learning
2022cites this paper
Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead
2021cites this paper
Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks
2020cites this paper
MS-NET: modular selective network
2020cites this paper
Mapping DCNN to a Three Layer Modular Architecture: A Systematic Way for Obtaining Wider and More Effective Network
2020influential citation