Permute to Train: A New Dimension to Training Deep Neural Networks

Published 2020 in arXiv.org

ABSTRACT

We show that Deep Neural Networks (DNNs) can be efficiently trained by permuting neuron connections. We introduce a new family of methods to train DNNs called Permute to Train (P2T). Two implementations of P2T are presented: Stochastic Gradient Permutation and Lookahead Permutation. The former computes permutation based on gradient, and the latter depends on another optimizer to derive the permutation. We empirically show that our proposed method, despite only swapping randomly weighted connections, achieves comparable accuracy to that of Adam on MNIST, Fashion-MNIST, and CIFAR-10 datasets. It opens up possibilities for new ways to train and regularize DNNs.

PUBLICATION RECORD

Publication year
2020
Venue
arXiv.org
Publication date
2020-03-05
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2003.02570
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
2019influential reference
Lookahead Optimizer: k steps forward, 1 step back
2019cited by this paper
What’s Hidden in a Randomly Weighted Neural Network?
2019cited by this paper
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
2019cited by this paper
Weight Agnostic Neural Networks
2019influential reference
Memristors for the Curious Outsiders
2018cited by this paper
On First-Order Meta-Learning Algorithms
2018cited by this paper
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
2018influential reference
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
2017cited by this paper
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
2017cited by this paper
Identity Mappings in Deep Residual Networks
2016influential reference
Densely Connected Convolutional Networks
2016cited by this paper
Xception: Deep Learning with Depthwise Separable Convolutions
2016cited by this paper
Very deep convolutional neural network based image classification using small training sample size
2015cited by this paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
SIMD compression and the intersection of sorted integers
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Decoding billions of integers per second through vectorization
2012cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009influential reference
Asymptotic Distribution of Coordinates on High Dimensional Spheres
2007cited by this paper
Self-organized computation with unreliable, memristive nanodevices
2007cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
A Simple Weight Decay Can Improve Generalization
1991cited by this paper
Some methods of speeding up the convergence of iteration methods
1964cited by this paper

CITED BY

No citing papers are available for this paper.