Snapshot Ensembles: Train 1, get M for free

Gao Huang,Yixuan Li,Geoff Pleiss,Zhuang Liu,J. Hopcroft,Kilian Q. Weinberger

Published 2017 in International Conference on Learning Representations

ABSTRACT

Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. The resulting technique, which we refer to as Snapshot Ensembling, is simple, yet surprisingly effective. We show in a series of experiments that our approach is compatible with diverse network architectures and learning tasks. It consistently yields lower error rates than state-of-the-art single models at no additional training cost, and compares favorably with traditional network ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain error rates of 3.4% and 17.4% respectively.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Learning Representations
Publication date
2017-04-01
Fields of study
Computer Science
Identifiers
arXiv 1704.00109
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS
2018cited by this paper
Deep Networks with Stochastic Depth
2016influential reference
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
2016cited by this paper
Boosted Convolutional Neural Networks
2016cited by this paper
SGDR: Stochastic Gradient Descent with Restarts
2016influential reference
Temporal Ensembling for Semi-Supervised Learning
2016cited by this paper
SGDR: Stochastic Gradient Descent with Warm Restarts
2016influential reference
Wide Residual Networks
2016influential reference
FractalNet: Ultra-Deep Neural Networks without Residuals
2016cited by this paper
Densely Connected Convolutional Networks
2016influential reference
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2016cited by this paper
Edinburgh Neural Machine Translation Systems for WMT 16
2016cited by this paper
Swapout: Learning an ensemble of deep architectures
2016cited by this paper
Deep Learning without Poor Local Minima
2016cited by this paper
Identity Mappings in Deep Residual Networks
2016influential reference
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
No More Pesky Learning Rate Guessing Games
2015cited by this paper
Distilling the Knowledge in a Neural Network
2015cited by this paper
Highway Networks
2015cited by this paper
Deep Residual Learning for Image Recognition
2015influential reference
Qualitatively characterizing neural network optimization problems
2014cited by this paper
Deeply-Supervised Nets
2014cited by this paper
FitNets: Hints for Thin Deep Nets
2014cited by this paper
On Using Very Large Target Vocabulary for Neural Machine Translation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014influential reference
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2014cited by this paper
Striving for Simplicity: The All Convolutional Net
2014cited by this paper
Regularization of Neural Networks using DropConnect
2013influential reference
Horizontal and Vertical Ensemble with Deep Representation for Classification
2013cited by this paper
Network In Network
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Convolutional neural networks applied to house numbers digit classification
2012cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Torch7: A Matlab-like Environment for Machine Learning
2011cited by this paper
Reading Digits in Natural Images with Unsupervised Feature Learning
2011cited by this paper
The conference paper
2011cited by this paper
Neural network with ensembles
2010cited by this paper
Large-Scale Machine Learning with Stochastic Gradient Descent
2010influential reference
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009influential reference
Ensemble selection from libraries of models
2004cited by this paper
Fast committee learning: preliminary results
1998cited by this paper
Neural Network Ensembles, Cross Validation, and Active Learning
1994cited by this paper

CITED BY

Efficient Ensemble Learning with Curriculum-Based Masked Autoencoders for Retinal OCT Classification
2026influential citation
The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging
2026cites this paper
Trustworthy Data-Driven Wildfire Risk Prediction and Understanding in Western Canada
2026cites this paper
Graph Neural Networks are Heuristics
2026cites this paper
Deep residual networks with convolutional feature extraction for short-term load forecasting
2026cites this paper
Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models
2026influential citation
An End-to-End Ensemble Learning Approach for Enhancing Wind Power Forecasting
2026cites this paper
A novel parametric scaled exponential linear unit activation function for deep residual networks in short-term load forecasting
2026cites this paper
UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers
2026cites this paper
Richer Bayesian Last Layers with Subsampled NTK Features
2026cites this paper
SEFFNet: snapshot ensemble-based feature fusion network for skin cancer classification
2026cites this paper
Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation
2026cites this paper
When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models
2026cites this paper
Image Style Transfer–Based High- and Low-Frequency Component Synthesis Neural Architecture for Low-Dose Computed Tomography Denoising
2026cites this paper
A Kernel Approach for Semi-implicit Variational Inference
2026cites this paper
Refining Brain Image Interpretation Using Recursive Convolutional Networks: A Robust RCRN-Based Framework
2025cites this paper
Multi-class Gastrointestinal Diseases improved diagnosis based on Ensemble and Transfer Learning.
2025cites this paper
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
2025cites this paper
What Does It Take to Build a Performant Selective Classifier?
2025cites this paper
Uncertainty-Aware Retinal Vessel Segmentation via Ensemble Distillation
2025cites this paper
A Deep Evidential Recognition Method With Geometry-Aware Features for Radar Target Rejection and Classification
2025cites this paper
PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks
2025cites this paper
LATTICE: Efficient In-Memory DNN Model Versioning
2025cites this paper
Smart Tourism Landmark Recognition: A Multi-Threshold Enhancement and Selective Ensemble Approach Using YOLO11
2025cites this paper
Pre-training under infinite compute
2025cites this paper
A Predictive Electromagnetic Spectrum Framework for Intelligent Incident Analysis and Real-Time Communication Assurance
2025cites this paper
Beyond Local Minima: Enhancing Deep Models Through Bezier Curve Interpolation
2025cites this paper
CrowdNet: Adaptive Collaborative Inference for Dynamic Mobile Intelligent Service
2025cites this paper
An efficient model training framework for green AI
2025cites this paper
Towards reliable deepfake detection from uncertainty calibration perspective
2025cites this paper
A vision transformer ensemble and mobile augmented reality solution for mushroom toxicity classification
2025cites this paper
ReMem: Mutual Information-Aware Fine-tuning of Pretrained Vision Transformers for Effective Knowledge Distillation
2025cites this paper
Online Bayesian Approximation Based Uncertainty Aware Model for Ophthalmic Image Segmentation
2025cites this paper
Robust emotion recognition in thermal imaging with convolutional neural networks and grey wolf optimization
2025cites this paper
D-semble: Efficient Diversity-Guided Search for Resilient ML Ensembles
2025cites this paper
Evaluation of Post Hoc Uncertainty Quantification Approaches for Flood Detection From SAR Imagery
2025cites this paper
Self-Error Adjustment: Theory and Practice of Balancing Individual Performance and Diversity in Ensemble Learning
2025cites this paper
NAPER: Fault Protection for Real-Time Resource-Constrained Deep Neural Networks
2025cites this paper
PiercingEye: Dual-Space Video Violence Detection With Hyperbolic Vision-Language Guidance
2025cites this paper
Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence
2025cites this paper
Asymmetric Duos: Sidekicks Improve Uncertainty
2025cites this paper
Emotion recognition with multiple physiological parameters based on ensemble learning
2025cites this paper
Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning
2025cites this paper
Uncertainty in Deep Learning for EEG under Dataset Shifts
2025cites this paper
Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture
2025cites this paper
Designing metamaterials with programmable nonlinear responses and geometric constraints in graph space
2025influential citation
Ensemble-Based Fish Species Recognition in Challenging Underwater Environments
2025cites this paper
Exploring the role of layer variations in ANN Crowd behaviour and prediction accuracy
2025cites this paper
Structured Basis Function Networks: Loss-Centric Multi-Hypothesis Ensembles with Controllable Diversity
2025cites this paper
Optimizing Deep Residual Networks for Short-Term Load Forecasting With Multidimensional Weather Data and Principal Component Analysis
2025cites this paper
Distillation-Based Domain Generalization for Cross-Dataset EEG-Based Emotion Recognition
2025cites this paper
Uncertainty-Aware Hourly Air Temperature Mapping at 2 km Resolution via Physics-Guided Deep Learning
2025cites this paper
A Unified Noise-Curvature View of Loss of Trainability
2025cites this paper
Benchmarking noisy label detection methods
2025cites this paper
Symbolic Snapshot Ensembles
2025influential citation
Parameter Averaging in Link Prediction
2025cites this paper
Expertfuse: A huffman tree-based gradual expert integration framework for MoE models
2025cites this paper
JUHCCR-v1: a database for hand-drawn electrical and electronics circuit component recognition
2025cites this paper
Explainable Artificial Intelligence Approach Using Low-Dimensional Visualization and Ensembling Uncertainty Quantification for Rare Chromosomal Aberration Detection in Cytogenetic Imaging
2025cites this paper
SAMS-GNN: Self-Adaptive Multi-Scale Graph Neural Network for Multi-Band Spectrum Prediction
2025cites this paper
LT-Soups: Bridging Head and Tail Classes via Subsampled Model Soups
2025cites this paper
RF-Based 3D SLAM Rivaling Vision Approaches
2025cites this paper
Deep cross entropy fusion for pulmonary nodule classification based on ultrasound Imagery
2025influential citation
Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference
2025cites this paper
Bezier Distillation
2025cites this paper
End-to-end acoustic-articulatory dysarthric speech recognition leveraging large-scale pretrained acoustic features
2025cites this paper
E-ViM3: Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
2025cites this paper
Pre-Training and Ensembling of Deep Neural Networks for Target Gene Expression Prediction From Landmark Genes
2025cites this paper
A Novel Non-iterative Training Method for CNN Classifiers Using Gram–Schmidt Process
2025cites this paper
MS-NET-v2: modular selective network optimized by systematic generation of expert modules
2025influential citation
Revisiting semi-supervised learning in the era of foundation models
2025cites this paper
OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction
2025cites this paper
Generalize Audio Deepfake Algorithm Recognition via Attribution Enhancement
2025cites this paper
The impact of multi-modality fusion and deep learning on adult age estimation based on bone mineral density
2025cites this paper
Beyond a Single Mode: GAN Ensembles for Diverse Medical Data Generation
2025cites this paper
Neural Network Pruning for Invariance Learning
2025cites this paper
DyTTP: Trajectory Prediction with Normalization-Free Transformers
2025cites this paper
Noisy Deep Ensemble: Accelerating Deep Ensemble Learning via Noise Injection
2025influential citation
Artery segmentation and atherosclerotic plaque quantification using AI for murine whole slide images stained with oil red O
2025cites this paper
Last-layer committee machines for uncertainty estimations of benthic imagery
2025cites this paper
Performance Evaluation of Activation Functions in Deep Residual Networks for Short-Term Load Forecasting
2025cites this paper
Virtual neural networks: hundreds of souls in a body
2025cites this paper
CSCN: an efficient snapshot ensemble learning based sparse transformer model for long-range spatial-temporal traffic flow prediction
2025influential citation
SzegedAI at GenAI Detection Task 1: Beyond Binary - Soft-Voting Multi-Class Classification for Binary Machine-Generated Text Detection Across Diverse Language Models
2025influential citation
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
2025cites this paper
Frequentist uncertainties on neural density ratios with <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mtext> </mml:mtext><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math> ensembles
2025cites this paper
Model-Agnostic, Temperature-Informed Sampling Enhances Cross-Year Crop Mapping with Deep Learning
2025cites this paper
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
2025cites this paper
Hierarchical and Heterogeneous Federated Learning via a Learning-on-Model Paradigm
2025cites this paper
Partially Supervised Unpaired Multi-modal Learning for Label-Efficient Medical Image Segmentation
2025cites this paper
CYFLOD: Cyclic Filtering and Loss Damping for Alleviating Noisy Labels in Fine-Grained Visual Classification
2025cites this paper
A deep ensemble framework for human essential gene prediction by integrating multi-omics data
2025cites this paper
CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels
2025cites this paper
Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation
2025influential citation
Comparative study of ensemble-based uncertainty quantification methods for neural network interatomic potentials
2025cites this paper
HybridNDiff-UQ: Uncertainty Quantification for Hybrid Neural Differentiable Modeling
2025cites this paper
Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
2025cites this paper
Integrating snapshot ensemble learning into masked autoencoders for efficient self-supervised pretraining in medical imaging
2025cites this paper
Regime-Switching Langevin Monte Carlo Algorithms
2025cites this paper
Uncertainty Quantification for Safe and Reliable Autonomous Vehicles: A Review of Methods and Applications
2025cites this paper