Efficient batchwise dropout training using submatrices

Benjamin Graham,J. Reizenstein,Leigh Robinson

Published 2015 in arXiv.org

ABSTRACT

Dropout is a popular technique for regularizing artificial neural networks. Dropout networks are generally trained by minibatch gradient descent with a dropout mask turning off some of the units---a different pattern of dropout is applied to every sample in the minibatch. We explore a very simple alternative to the dropout mask. Instead of masking dropped out units by setting them to zero, we perform matrix multiplication using a submatrix of the weight matrix---unneeded hidden units are never calculated. Performing dropout batchwise, so that one pattern of dropout is used for each sample in a minibatch, we can substantially reduce training times. Batchwise dropout can be used with fully-connected and convolutional neural networks.

PUBLICATION RECORD

Publication year
2015
Venue
arXiv.org
Publication date
2015-02-09
Fields of study
Computer Science
Identifiers
arXiv 1502.02478
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Recurrent Neural Network Regularization
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Learning Ordered Representations with Nested Dropout
2014cited by this paper
Fractional Max-Pooling
2014cited by this paper
One weird trick for parallelizing convolutional neural networks
2014cited by this paper
Regularization of Neural Networks using DropConnect
2013cited by this paper
On the importance of initialization and momentum in deep learning
2013cited by this paper
Fast dropout training
2013cited by this paper
Multi-column deep neural networks for image classification
2012cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks
2006cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper

CITED BY

Concurrent Training and Layer Pruning of Deep Neural Networks
2024cites this paper
Robust Learning of Parsimonious Deep Neural Networks
2022cites this paper
DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems
2021cites this paper
Application of Artificial Neural Networks for Natural Gas Consumption Forecasting
2020cites this paper
Faster Neural Network Training with Approximate Tensor Operations
2018cites this paper
Active learning and input space analysis for deep networks
2018cites this paper
A Perceptually Inspired Data Augmentation Method for Noise Robust CNN Acoustic Models
2018cites this paper
Increasing the robustness of CNN acoustic models using ARMA spectrogram features and channel dropout
2017cites this paper
Active learning strategy for CNN combining batchwise Dropout and Query-By-Committee
2017influential citation
Increasing the robustness of CNN acoustic models using autoregressive moving average spectrogram features and channel dropout
2017cites this paper
Dropout distillation
2016cites this paper
Improved Dropout for Shallow and Deep Learning
2016cites this paper
Dropout as data augmentation
2015cites this paper
QBDC: Query by dropout committee for training deep supervised architecture
2015cites this paper