Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Sunny Sanyal,Hayden Prairie,Rudrajit Das,Ali Kavis,Sujay Sanghavi

Published 2025 in International Conference on Machine Learning

ABSTRACT

Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as"catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model's losses. Specifically, we upweight the easy samples on which the pre-trained model's loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8\%$ drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4\%$ more accuracy on the pre-training datasets. Our code is publicly available at https://github.com/sanyalsunny111/FLOW_finetuning .

PUBLICATION RECORD

Publication year
2025
Venue
International Conference on Machine Learning
Publication date
2025-02-05
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.48550/arXiv.2502.02797 arXiv 2502.02797
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
2025cited by this paper
Soup to go: mitigating forgetting during continual learning with model averaging
2025cited by this paper
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
2024cited by this paper
Understanding Finetuning for Factual Knowledge Extraction
2024cited by this paper
LoRA Learns Less and Forgets Less
2024influential reference
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
2024cited by this paper
Understanding the Training Speedup from Sampling with Approximate Losses
2024cited by this paper
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization
2024cited by this paper
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
2024influential reference
Incorporating neuro-inspired adaptability for continual learning in artificial intelligence
2023cited by this paper
Orthogonal Subspace Learning for Language Model Continual Learning
2023cited by this paper
Mitigating the Alignment Tax of RLHF
2023cited by this paper
Llama 2: Open Foundation and Fine-Tuned Chat Models
2023cited by this paper
NISPA: Neuro-Inspired Stability-Plasticity Adaptation for Continual Learning in Sparse Networks
2022cited by this paper
Editing Models with Task Arithmetic
2022cited by this paper
Finetune like you pretrain: Improved finetuning of zero-shot vision models
2022cited by this paper
CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One
2022cited by this paper
Forget-free Continual Learning with Winning Subnetworks
2022cited by this paper
Memory Replay with Data Compression for Continual Learning
2022cited by this paper
Training Networks in Null Space of Feature Covariance for Continual Learning
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
Model Zoo: A Growing Brain That Learns Continually
2021cited by this paper
GCR: Gradient Coreset based Replay Buffer Selection for Continual Learning
2021cited by this paper
Training Verifiers to Solve Math Word Problems
2021influential reference
ResNet strikes back: An improved training procedure in timm
2021cited by this paper
Datasets: A Community Library for Natural Language Processing
2021influential reference
Robust fine-tuning of zero-shot models
2021influential reference
Program Synthesis with Large Language Models
2021cited by this paper
How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation
2021cited by this paper
Rainbow Memory: Continual Learning with a Memory of Diverse Samples
2021cited by this paper
Measuring Mathematical Problem Solving With the MATH Dataset
2021cited by this paper
Coresets via Bilevel Optimization for Continual Learning and Streaming
2020cited by this paper
An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives
2020influential reference
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
Large-Scale Methods for Distributionally Robust Optimization
2020cited by this paper
Routing Networks with Co-training for Continual Learning
2020cited by this paper
Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams
2020cited by this paper
Supermasks in Superposition
2020cited by this paper
Continual Learning with Node-Importance based Adaptive Group Sparse Regularization
2020cited by this paper
Continual Learning With Extended Kronecker-Factored Approximate Curvature
2020cited by this paper
An Adversarial Winograd Schema Challenge at Scale
2019cited by this paper
Wide neural networks of any depth evolve as linear models under gradient descent
2019cited by this paper
Continual Learning with Tiny Episodic Memories
2019influential reference
Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning
2019cited by this paper
HellaSwag: Can a Machine Really Finish Your Sentence?
2019influential reference
Uncertainty-based Continual Learning with Adaptive Regularization
2019cited by this paper
Gradient based sample selection for online continual learning
2019cited by this paper
Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild
2019cited by this paper
Random Path Selection for Continual Learning
2019cited by this paper
Compacting, Picking and Growing for Unforgetting Continual Learning
2019cited by this paper
Orthogonal Gradient Descent for Continual Learning
2019cited by this paper
PIQA: Reasoning about Physical Commonsense in Natural Language
2019influential reference
Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization
2019cited by this paper
Online Learned Continual Compression with Adaptive Quantization Modules
2019cited by this paper
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
2018influential reference
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
2018cited by this paper
Efficient Lifelong Learning with A-GEM
2018cited by this paper
Learning Without Memorizing
2018cited by this paper
Learning Models with Uniform Performance via Distributionally Robust Optimization
2018cited by this paper
Continual learning of context-dependent processing in neural networks
2018cited by this paper
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
2018influential reference
Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting
2018cited by this paper
Selective Experience Replay for Lifelong Learning
2018cited by this paper
Overcoming catastrophic forgetting with hard attention to the task
2018cited by this paper
Incremental Classifier Learning with Generative Adversarial Networks
2018cited by this paper
Not All Samples Are Created Equal: Deep Learning with Importance Sampling
2018cited by this paper
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
2018cited by this paper
Progress & Compress: A scalable framework for continual learning
2018cited by this paper
Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference
2018cited by this paper
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
2017influential reference
Biased Importance Sampling for Deep Neural Network Training
2017cited by this paper
Gradient Episodic Memory for Continual Learning
2017cited by this paper
Less-forgetful Learning for Domain Expansion in Deep Neural Networks
2017cited by this paper
Encoder Based Lifelong Learning
2017cited by this paper
Continual Learning Through Synaptic Intelligence
2017cited by this paper
Safe Adaptive Importance Sampling
2017cited by this paper
Overcoming Catastrophic Forgetting by Incremental Moment Matching
2017cited by this paper
Memory Aware Synapses: Learning what (not) to forget
2017cited by this paper
Lifelong Learning with Dynamically Expandable Networks
2017cited by this paper
FearNet: Brain-Inspired Model for Incremental Learning
2017cited by this paper
Progressive Neural Networks
2016cited by this paper
Training Region-Based Object Detectors with Online Hard Example Mining
2016cited by this paper
iCaRL: Incremental Classifier and Representation Learning
2016influential reference
Learning without Forgetting
2016influential reference
Overcoming catastrophic forgetting in neural networks
2016cited by this paper
Expert Gate: Lifelong Learning with a Network of Experts
2016cited by this paper
Online Batch Selection for Faster Training of Neural Networks
2015cited by this paper
Variance Reduction in SGD by Distributed Importance Sampling
2015cited by this paper
Distilling the Knowledge in a Neural Network
2015cited by this paper
Food-101 - Mining Discriminative Components with Random Forests
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization
2014cited by this paper
Robust Solutions of Optimization Problems Affected by Uncertain Probabilities
2013cited by this paper
3D Object Representations for Fine-Grained Categorization
2013cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
Automated Flower Classification over a Large Number of Classes
2008cited by this paper
The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data
2002cited by this paper
Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.
1995cited by this paper

CITED BY

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
2026influential citation
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning
2026cites this paper
Structured Prototype-Guided Adaptation for EEG Foundation Models
2026cites this paper
Decoupling Vision and Language: Codebook Anchored Visual Adaptation
2026cites this paper
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
2025cites this paper
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
2025influential citation
Context-Free Synthetic Data Mitigates Forgetting
2025cites this paper