Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches

Guangrun Wang,Jiefeng Peng,Ping Luo,Xinjiang Wang,Liang Lin

Published 2018 in arXiv.org

ABSTRACT

As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with scenario of micro-batch (e.g., less than 10 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. In this paper, we present a novel normalization method, called Batch Kalman Normalization (BKN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, BKN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. BKN has two appealing properties. First, it enables more stable training and faster convergence compared to previous works. Second, training DNNs using BKN performs substantially better than those using BN and its variants, especially when very small mini-batches are presented. On the image classification benchmark of ImageNet, using BKN powered networks we improve upon the best-published model-zoo results: reaching 74.0% top-1 val accuracy for InceptionV2. More importantly, using BKN achieves the comparable accuracy with extremely smaller batch size, such as 64 times smaller on CIFAR-10/100 and 8 times smaller on ImageNet.

PUBLICATION RECORD

Publication year
2018
Venue
arXiv.org
Publication date
2018-02-09
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1802.03133
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

EigenNet: Towards Fast and Structural Learning of Deep Neural Networks
2017cited by this paper
Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
2017cited by this paper
Learning Deep Architectures via Generalized Whitened Neural Networks
2017cited by this paper
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
2016cited by this paper
Recurrent Batch Normalization
2016cited by this paper
Layer Normalization
2016cited by this paper
Improved Techniques for Training GANs
2016cited by this paper
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
2016cited by this paper
Natural Neural Networks
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Deep Residual Learning for Image Recognition
2015influential reference
ImageNet Large Scale Visual Recognition Challenge
2014influential reference
Mean-normalized stochastic gradient for large-scale deep learning
2014cited by this paper
Going deeper with convolutions
2014influential reference
Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging
2014cited by this paper
Efficient BackProp
2012cited by this paper
Deep Learning Made Easier by Linear Transformations in Perceptrons
2012cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
Et al
2008cited by this paper
A New Approach to Linear Filtering and Prediction Problems
2002cited by this paper
A New Approach to Linear Filtering and Prediction Problems
2001cited by this paper

CITED BY

Adversarial Attacks and Batch Normalization: A Batch Statistics Perspective
2023cites this paper
OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search
2022cites this paper
Instance Segmentation Based on Improved Self-Adaptive Normalization
2022cites this paper
Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation
2020cites this paper
Kalman meets Bellman: Improving Policy Evaluation through Value Tracking
2020cites this paper
Exemplar Normalization for Learning Deep Representation
2020cites this paper
3D shape instantiation for intra-operative navigation from a single 2D projection
2020cites this paper
The Foundation and Advances of Deep Learning
2019cites this paper
Deep Learning at the Interface of Agricultural Insurance Risk and Spatio-Temporal Uncertainty in Weather Extremes
2019cites this paper
Trust Region Value Optimization using Kalman Filtering
2019cites this paper
Adaptively Connected Neural Networks
2019cites this paper
Switchable Whitening for Deep Representation Learning
2019cites this paper
Switchable Normalization for Learning-to-Normalize Deep Representation
2019cites this paper
U-Net Training with Instance-Layer Normalization
2019cites this paper
TRADI: Tracking deep neural network weight distributions
2019cites this paper
Differentiable Learning-to-Normalize via Switchable Normalization
2018influential citation
Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?
2018cites this paper
Batch Normalization Sampling
2018cites this paper