Stochastic Normalizations as Bayesian Learning

Published 2018 in Asian Conference on Computer Vision

ABSTRACT

In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.

PUBLICATION RECORD

Publication year
2018
Venue
Asian Conference on Computer Vision
Publication date
2018-11-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1007/978-3-030-20890-5_30 arXiv 1811.00639
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Uncertainty Estimation via Stochastic Batch Normalization
2018influential reference
How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)
2018cited by this paper
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
2018influential reference
Lightweight Probabilistic Deep Networks
2018influential reference
Normalization of Neural Networks using Analytic Variance Propagation
2018influential reference
Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift
2018influential reference
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
2017cited by this paper
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
2016influential reference
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
2016cited by this paper
Wide Residual Networks
2016influential reference
Deep Residual Learning for Image Recognition
2015cited by this paper
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
2015influential reference
Weight Uncertainty in Neural Networks
2015cited by this paper
U-Net: Convolutional Networks for Biomedical Image Segmentation
2015cited by this paper
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
2015influential reference
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015influential reference
Gradient Estimation Using Stochastic Computation Graphs
2015cited by this paper
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2015influential reference
Variational Dropout and the Local Reparameterization Trick
2015influential reference
A benchmark for comparison of cell tracking algorithms
2014influential reference
Striving for Simplicity: The All Convolutional Net
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Practical Variational Inference for Neural Networks
2011influential reference
Bayesian Statistics: An Introduction
1989cited by this paper

CITED BY

An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms
2025cites this paper
Hybrid Deep Learning Models for Mapping Surface No2 Across China: One Complicated Model, Many Simple Models, or Many Complicated Models?
2022cites this paper
Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
2022cites this paper
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
2022cites this paper
IMPROVE Visiolinguistic Performance with Re-Query
2022cites this paper
Radial Basis Function Networks for Image Restoration with Stochastic Normalizations as Bayesian Learning in Deep Conventional Neural Network
2022cites this paper
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
2021cites this paper
Initialization and Transfer Learning of Stochastic Binary Networks from Real-Valued Ones
2021cites this paper
Stochastic Normalization
2020cites this paper
Appendix for: How Good is the Bayes Posterior in Deep Neural Networks Really?
2020cites this paper
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
2020cites this paper
Momentum Batch Normalization for Deep Learning with Small Batch Size
2020cites this paper
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
2020cites this paper
Group Whitening: Balancing Learning Efficiency and Representational Capacity
2020cites this paper
An Investigation Into the Stochasticity of Batch Whitening
2020cites this paper
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise
2019cites this paper