Dropout Training as Adaptive Regularization

Published 2013 in Neural Information Processing Systems

ABSTRACT

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.

PUBLICATION RECORD

Publication year
2013
Venue
Neural Information Processing Systems
Publication date
2013-07-04
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1307.1493
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Feature Noising for Log-Linear Structured Prediction
2013cited by this paper
Learning with Marginalized Corrupted Features
2013cited by this paper
Maxout Networks
2013cited by this paper
Fast dropout training
2013cited by this paper
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
2012cited by this paper
Improving neural networks by preventing co-adaptation of feature detectors
2012cited by this paper
Learning Word Vectors for Sentiment Analysis
2011cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Large Scale Text Classification using Semisupervised Multinomial Naive Bayes
2011cited by this paper
Adding noise to the input of a model trained with a regularized objective
2011cited by this paper
The Manifold Tangent Classifier
2011cited by this paper
Regularization Paths for Generalized Linear Models via Coordinate Descent.
2010cited by this paper
Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach
2007cited by this paper
Entropy Regularization
2006cited by this paper
Semi-supervised Learning by Entropy Minimization
2004cited by this paper
The Tradeoff Between Generative and Discriminative Classifiers
2004cited by this paper
Classification with Hybrid Generative/Discriminative Models
2003cited by this paper
Transformation invariance in pattern recognition: Tangent distance and propagation
2000cited by this paper
Text Classification from Labeled and Unlabeled Documents using EM
2000cited by this paper
Transductive Inference for Text Classification using Support Vector Machines
1999cited by this paper
Improving the Accuracy and Speed of Support Vector Machines
1996cited by this paper
Noise injection into inputs in back-propagation learning
1992cited by this paper
Learning from hints in neural networks
1990cited by this paper
Theory of point estimation
1950cited by this paper
Current address: Microsoft Research,
year unknowncited by this paper
Noname manuscript No. (will be inserted by the editor) Adaptive Regularization of Weight Vectors
year unknowninfluential reference

CITED BY

DropoutTS: Sample-Adaptive Dropout for Robust Time Series Forecasting
2026cites this paper
Intelligent O-RAN Optimization: AI/ML-Enabled Dynamic Prediction for Adaptive Rate Control
2026cites this paper
Quantum-Assisted Design of Space-Terrestrial Integrated Networks
2026cites this paper
Ciao: Cross-architecture IoT Malware Family Classification with Code Reuse
2026cites this paper
Trapped by simplicity: When Transformers fail to learn from noisy features
2026cites this paper
Phase Diagram of Dropout for Two-Layer Neural Networks in the Mean-Field Regime
2025cites this paper
PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks
2025cites this paper
A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization
2025cites this paper
Neuroplasticity in Artificial Intelligence - An Overview and Inspirations on Drop In & Out Learning
2025cites this paper
On the generalization ability of probabilistic neural networks for hyperspectral remote sensing of absorption properties across optically complex waters
2025cites this paper
Deepphysio: detecting deepFake with non-personalized feature of physiological signal
2025influential citation
LiGNN: Accelerating GNN Training Through Locality-Aware Dropout
2025cites this paper
Revisiting human activity recognition using smaller DNN
2025cites this paper
End-to-end closed-loop optoelectronic computing breaking precision–accuracy coupling
2025cites this paper
Shedding light on uncertainties in machine learning: formal derivation and optimal model selection
2025cites this paper
Quantile deep learning models for multi-step ahead time series prediction
2025cites this paper
Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR
2025cites this paper
Dropout Neural Network Training Viewed from a Percolation Perspective
2025cites this paper
Accelerating GNN Training through Locality-aware Dropout and Merge
2025cites this paper
In-Context Learning Enhanced Credibility Transformer
2025cites this paper
Drop Dropout on Single-Epoch Language Model Pretraining
2025cites this paper
Beyond Random Masking: When Dropout meets Graph Convolutional Networks
2025cites this paper
Domain-Generalized Gesture Recognition via mmWave Radar Signal Multiview Learning
2025influential citation
Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept
2025cites this paper
A unified gradient regularization method for heterogeneous graph neural networks
2025cites this paper
Analytic theory of dropout regularization
2025cites this paper
Dual Dependency Disentangling for Defending Model Inversion Attacks in Split Federated Learning
2025cites this paper
Deep Learning-Based Hybrid Scenario for Classification of Periapical Lesions in Cone Beam Computed Tomography
2025cites this paper
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
2025cites this paper
Ethic-BERT: An Enhanced Deep Learning Model for Ethical and Non-Ethical Content Classification
2025cites this paper
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
2025influential citation
DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation
2025cites this paper
Prior knowledge of layer-specific pruning numbers guarantees effective random pruning at initialization
2025cites this paper
Spike-based neuromorphic computing: An overview from bio-inspiration to hardware architectures and learning mechanisms
2025cites this paper
Random Forest Autoencoders for Guided Representation Learning
2025cites this paper
Revisiting Randomization in Greedy Model Search
2025cites this paper
Experience Rating in Insurance Pricing
2024cites this paper
Singular-limit analysis of gradient descent with noise injection
2024influential citation
Unraveling generalized parton distributions through Lorentz symmetry and partial DGLAP knowledge
2024cites this paper
Review-based recommendation under preference uncertainty: An asymmetric deep learning framework
2024cites this paper
Graph Convolutional Networks With Adaptive Neighborhood Awareness
2024cites this paper
Improving Discharge Predictions in Ungauged Basins: Harnessing the Power of Disaggregated Data Modeling and Machine Learning
2024cites this paper
Implicit Regularization Paths of Weighted Neural Representations
2024cites this paper
Stacked deep learning approach for efficient SARS-CoV-2 detection in blood samples
2024cites this paper
Mask-Shift-Inference: A novel paradigm for domain generalization
2024cites this paper
Quantile deep learning models for multi-step ahead time series prediction
2024cites this paper
Generalized zero-shot action recognition through reservation-based gate and semantic-enhanced contrastive learning
2024cites this paper
Convergence Analysis for Federated Dropout
2024cites this paper
Beyond Self-Consistency: Loss-Balanced Perturbation-Based Regularization Improves Industrial-Scale Ads Ranking
2024cites this paper
Robust gradient aware and reliable entropy minimization for stable test-time adaptation in dynamic scenarios
2024cites this paper
Road Pothole Detection Model Based on AlexNet Convolutional Neural Network and Confusion Matrix
2024cites this paper
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
2024cites this paper
Deep learning and data augmentation for robust battery state of charge estimation in electric vehicles
2024cites this paper
Dropout Regularization Versus l2-Penalization in the Linear Model
2024influential citation
ACFed: Communication-Efficient & Class-Balancing Federated Learning with Adaptive Consensus Dropout & Model Quantization
2024cites this paper
Damage explains function in spiking neural networks representing central pattern generator
2024cites this paper
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
2024cites this paper
Measuring Orthogonality as the Blind-Spot of Uncertainty Disentanglement
2024influential citation
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
2024cites this paper
ASLRing: American Sign Language Recognition with Meta-Learning on Wearables
2024cites this paper
A Study on Tooth Decay Detection Using a Simple Convolutional Neural Network Model
2024cites this paper
Efficient Stagewise Pretraining via Progressive Subnetworks
2024cites this paper
Improving Generalization in Aerial and Terrestrial Mobile Robots Control Through Delayed Policy Learning
2024cites this paper
On Mitigating Performance Disparities in Multilingual Speech Recognition
2024cites this paper
Comparison of autoencoder architectures for fault detection in industrial processes
2024cites this paper
A Generative AI approach to improve in-situ vision tool wear monitoring with scarce data
2024cites this paper
Experimental investigation and AutoML prediction of the resilient behaviour of coarse-grained waste rocks
2024cites this paper
Deterministic Convergence of Backpropagation Algorithm with Cyclic Dropconnect for Linear Output Neural Networks
2023cites this paper
Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators
2023cites this paper
The utility of machine learning for predicting donor discard in abdominal transplantation
2023cites this paper
Nature-Inspired DBN based Optimization Techniques for Image De-noising
2023cites this paper
Super-resolution and uncertainty estimation from sparse sensors of dynamical physical systems
2023cites this paper
Mineral Texture Classification Using Deep Convolutional Neural Networks: An Application to Zircons From Porphyry Copper Deposits
2023cites this paper
Multi-view subspace clustering using drop out technique on points
2023cites this paper
Uncertainty Estimation for Complex Text Detection in Spanish
2023cites this paper
A Practical System for 3-D Hand Pose Tracking Using EMG Wearables With Applications to Prosthetics and User Interfaces
2023cites this paper
Flat Minima in Linear Estimation and an Extended Gauss Markov Theorem
2023cites this paper
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
2023cites this paper
Towards a Deeper Understanding of Global Covariance Pooling in Deep Learning: An Optimization Perspective
2023cites this paper
AdaFlood: Adaptive Flood Regularization
2023cites this paper
Hessian regularization of deep neural networks: A novel approach based on stochastic estimators of Hessian trace
2023cites this paper
SignQuery: A Natural User Interface and Search Engine for Sign Languages with Wearable Sensors
2023cites this paper
A holistic approach for improving milling machine cutting tool wear prediction
2023cites this paper
Knowledge-Driven Online Multimodal Automated Phenotyping System
2023cites this paper
Quadratic Neural Networks for Solving Inverse Problems
2023influential citation
A mathematical and neural network-based hybrid technique for detecting the prostate contour from medical image data
2023cites this paper
Dropout Training is Distributionally Robust Optimal
2023cites this paper
Vessel Delineation Using U-Net: A Sparse Labeled Deep Learning Approach for Semantic Segmentation of Histological Images
2023cites this paper
MMP Net: A feedforward neural network model with sequential inputs for representing continuous multistage manufacturing processes without intermediate outputs
2023cites this paper
Neural optimization for quantum architectures: graph embedding problems with Distance Encoder Networks
2023cites this paper
Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments
2023cites this paper
Dropout Ensemble Kalman inversion for high dimensional inverse problems
2023cites this paper
USDNL: Uncertainty-Based Single Dropout in Noisy Label Learning
2023cites this paper
I Am an Earphone and I Can Hear My User’s Face: Facial Landmark Tracking Using Smart Earphones
2023influential citation
Robot Motion Prediction by Channel State Information
2023cites this paper
Predictive overfitting in immunological applications: Pitfalls and solutions
2023cites this paper
Knowledge Distillation for Efficient Audio-Visual Video Captioning
2023cites this paper
Exploration and Exploitation of Unlabeled Data for Open-Set Semi-supervised Learning
2023cites this paper
GeNNius: an ultrafast drug–target interaction inference method based on graph neural networks
2023influential citation
Generalized equivalences between subsampling and ridge regularization
2023cites this paper