Not All Samples Are Created Equal: Deep Learning with Importance Sampling

Published 2018 in International Conference on Machine Learning

ABSTRACT

Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally, on image classification, CNN fine-tuning, and RNN training, that for a fixed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5% and 17%.

PUBLICATION RECORD

Publication year
2018
Venue
International Conference on Machine Learning
Publication date
2018-03-02
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1803.00942
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning What Data to Learn
2017cited by this paper
Non-convex Finite-Sum Optimization Via SCSG Methods
2017influential reference
Sampling Matters in Deep Embedding Learning
2017cited by this paper
Layer Normalization
2016cited by this paper
Importance Sampling Tree for Large-scale Empirical Expectation
2016cited by this paper
Katyusha: the first direct acceleration of stochastic gradient methods
2016cited by this paper
Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent
2016cited by this paper
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
2016cited by this paper
Wide Residual Networks
2016influential reference
Deep Residual Learning for Image Recognition
2015cited by this paper
Discriminative Learning of Deep Convolutional Feature Point Descriptors
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
2015cited by this paper
FaceNet: A unified embedding for face recognition and clustering
2015cited by this paper
Variance Reduction in SGD by Distributed Importance Sampling
2015influential reference
Prioritized Experience Replay
2015influential reference
Online Batch Selection for Faster Training of Neural Networks
2015influential reference
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
2014cited by this paper
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
2014influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization
2014cited by this paper
On optimal probabilities in stochastic coordinate descent methods
2013cited by this paper
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
2013influential reference
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
2013influential reference
Understanding the difficulty of training deep feedforward neural networks
2010cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Recognizing indoor scenes
2009cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009influential reference
Curriculum learning
2009cited by this paper
Fast Kernel Classifiers with Online and Active Learning
2005cited by this paper
Long Short-Term Memory
1997influential reference

CITED BY

Efficient Content-based Recommendation Model Training via Noise-aware Coreset Selection
2026cites this paper
Untargeted Poisoning Membership Inference With Sample Selection and Enhancement
2026cites this paper
Lightweight Adaptive Quantization Algorithms for Federated Learning With Heterogeneous Clients
2026cites this paper
How Hyper-Datafication Impacts the Sustainability Costs in Frontier AI
2026cites this paper
Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps
2026cites this paper
Depth as Prior Knowledge for Object Detection
2026cites this paper
Nonparametric Teaching of Attention Learners
2026cites this paper
Point-ITR: Task-Oriented Importance Sampling for Large-Scale 3D Point Clouds in Manufacturing
2026cites this paper
Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization
2026cites this paper
Stop Preaching and Start Practising Data Frugality for Responsible Development of AI
2026cites this paper
GeoFL: A Framework for Efficient Geo-Distributed Cross-Device Federated Learning
2026cites this paper
Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training
2026cites this paper
Privacy-preserving transfer learning via one-time encrypted data filtering
2026cites this paper
United We Defend: Collaborative Membership Inference Defenses in Federated Learning
2026cites this paper
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
2026cites this paper
Efficient Hyperparameter Search for Non-Stationary Model Training
2025cites this paper
Adaptive Sample Weighting with Regime-Aware Meta-Learning Framework for Financial Forecasting
2025cites this paper
Towards Active Synthetic Data Generation for Finetuning Language Models
2025cites this paper
Fair and Efficient Federated Learning Client Selection via Dynamic Contribution Evaluation
2025cites this paper
TinyUSFM: Towards Compact and Efficient Ultrasound Foundation Models
2025cites this paper
LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora
2025cites this paper
Towards Efficient Straggler Management in Distributed Deep Learning Training
2025cites this paper
Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL
2025cites this paper
Strong Antithetic Variance Reduction Inequalities
2025cites this paper
Fair Bayesian Data Selection via Generalized Discrepancy Measures
2025cites this paper
Enhanced Neural Architecture Search with Multi-Dimensional Curriculum Learning for Defect Detection
2025cites this paper
Synthetic Text Generation for Training Large Language Models via Gradient Matching
2025cites this paper
Preparation Meets Opportunity: Enhancing Data Preprocessing for ML Training With Seneca
2025cites this paper
Bandit Guided Submodular Curriculum for Adaptive Subset Selection
2025cites this paper
SuRe: Surprise-Driven Prioritised Replay for Continual LLM Learning
2025cites this paper
Data-Quality-based Aggregation Methods in Federated Learning: a Comprehensive Study
2025cites this paper
Split-LEO: efficient AI model training over LEO satellite networks
2025cites this paper
InWaveSR: Topography-aware super-resolution network for internal solitary waves
2025cites this paper
Tricks and Plug-ins for Gradient Boosting with Transformers
2025cites this paper
Iterative Misclassification Error Training (IMET): An Optimized Neural Network Training Technique for Image Classification
2025cites this paper
Vector-Valued Monte Carlo Integration Using Ratio Control Variates
2025cites this paper
Dataset Distillation for Super-Resolution Without Class Labels and Pre-Trained Models
2025cites this paper
Noise-free Loss Gradients: A Surprisingly Effective Baseline for Coreset Selection
2025cites this paper
An Efficient On-Device Federated Learning System Through the Interplay of Client Selection and Batch Size With Watermarked Data
2025cites this paper
Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration
2025cites this paper
Determining an Optimal Small Subset of Training Data for Deep Learning Models in Computer Vision
2025cites this paper
Tricks and Plug-Ins for Gradient Boosting in Image Classification
2025cites this paper
Proxy-Validated Importance-Aware Federated Sample Selection with Meta Learning
2025cites this paper
Clean-label backdoor attack via sample-customized feature alignment
2025cites this paper
Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices
2025cites this paper
Mitigating Class Imbalance and Enhancing Unlabeled Data Extraction in Semi-Supervised Deep Learning for Martian Terrain Segmentation
2025cites this paper
SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection
2025cites this paper
Energy-Efficient and Data-Optimized Federated Learning for Distributed On-Device Intelligence
2025cites this paper
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
2025influential citation
Computational Budget Should Be Considered in Data Selection
2025cites this paper
Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training
2025cites this paper
AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs
2025influential citation
Sample selection using multi-task autoencoders in federated learning with non-IID data
2025cites this paper
An efficient model training framework for green AI
2025cites this paper
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
2025cites this paper
FUNU: Boosting Machine Unlearning Efficiency by Filtering Unnecessary Unlearning
2025cites this paper
Deep learning driven silicon wafer defect segmentation and classification
2025cites this paper
CAFE+: Towards Compact, Adaptive, and Fast Embedding for Large-scale Online Recommendation Models
2025influential citation
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
2025cites this paper
Leveraging Game Theory and XAI for Data Quality-Driven Sample and Client Selection in Trustworthy Split Federated Learning
2025cites this paper
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
2025cites this paper
Learning to Reason at the Frontier of Learnability
2025cites this paper
Not All Clients Are Equal: Personalized Federated Learning on Heterogeneous Multi-Modal Clients
2025cites this paper
A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
2025cites this paper
Importance Sampling for Nonlinear Models
2025cites this paper
A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances
2025cites this paper
Hierarchical Group-wise Ranking Framework for Recommendation Models
2025cites this paper
Noise-Tolerant Coreset-Based Class Incremental Continual Learning
2025cites this paper
Dealing with Noisy Data in Federated Learning: An Incentive Mechanism with Flexible Pricing
2025cites this paper
Highway autonomous vehicle decision-making method based on prior knowledge and improved experience replay reinforcement learning algorithm
2025cites this paper
ADAM Optimization with Adaptive Batch Selection
2025cites this paper
GradMix: Gradient-based Selective Mixup for Robust Data Augmentation in Class-Incremental Learning
2025cites this paper
Laplace Sample Information: Data Informativeness Through a Bayesian Lens
2025cites this paper
Self-Evolving Curriculum for LLM Reasoning
2025cites this paper
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining
2025cites this paper
Instance Data Condensation for Image Super-Resolution
2025cites this paper
OASIS: Online Sample Selection for Continual Visual Instruction Tuning
2025influential citation
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
2025cites this paper
Taming The Overhead of Hiding Samples in Deep Neural Network Training
2025cites this paper
GeoFL: A Framework for Efficient Geo-Distributed Cross-Device Federated Learning
2025cites this paper
A Learning-Based Sequence-to-Sequence WiFi Fingerprinting Framework for Accurate Pedestrian Indoor Localization Using Unconstrained RSSI
2025cites this paper
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
2025cites this paper
Efficient Message Passing Algorithm and Architecture Co-Design for Graph Neural Networks
2025cites this paper
Looking elsewhere: improving variational Monte Carlo gradients by importance sampling
2025cites this paper
Physics-Informed Neural Networks For Semiconductor Film Deposition: A Review
2025cites this paper
LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning
2025cites this paper
Sensitivity of Stability: Theoretical & Empirical Analysis of Replicability for Adaptive Data Selection in Transfer Learning
2025cites this paper
A cyclical loss-based optimization algorithm for pretraining LLMs on noisy data
2025cites this paper
Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems
2025cites this paper
Stochastic Gradient Descent with Strategic Querying
2025cites this paper
Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis
2025cites this paper
Bayesian Coreset Optimization for Personalized Federated Learning
2025cites this paper
PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning
2025cites this paper
Focal Sampling: SGD biased towards early important samples for efficient image classification with augmentation selection
2025influential citation
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
2025cites this paper
Physics-Informed Fine-Tuning for physics discovery from random and sparse data
2025cites this paper
HyperCore: Coreset Selection under Noise via Hypersphere Models
2025cites this paper
Data-Efficient Training by Evolved Sampling
2025cites this paper
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
2025cites this paper
Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP
2025cites this paper