LoRA Learns Less and Forgets Less

D. Biderman,Jose Javier Gonzalez Ortiz,J. Portes,M. Paul,Philip Greengard,Connor Jennings,Daniel King,Sam Havens,Vitaliy Chiley,Jonathan Frankle,Cody Blakeney,John P. Cunningham

Published 2024 in Trans. Mach. Learn. Res.

ABSTRACT

Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning (approximately 100K prompt-response pairs) and continued pretraining (20B unstructured tokens) data regimes. Our results show that, in the standard low-rank settings, LoRA substantially underperforms full finetuning. Nevertheless, LoRA better maintains the base model's performance on tasks outside the target domain. We show that LoRA mitigates forgetting more than common regularization techniques such as weight decay and dropout; it also helps maintain more diverse generations. Finally, we show that full finetuning learns perturbations with a rank that is 10-100X greater than typical LoRA configurations, possibly explaining some of the reported gaps. We conclude by proposing best practices for finetuning with LoRA.

PUBLICATION RECORD

Publication year
2024
Venue
Trans. Mach. Learn. Res.
Publication date
2024-05-15
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.48550/arXiv.2405.09673 arXiv 2405.09673
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
2024influential reference
The Llama 3 Herd of Models
2024cited by this paper
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
2024cited by this paper
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
2024cited by this paper
MAmmoTH2: Scaling Instructions from the Web
2024cited by this paper
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
2024cited by this paper
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
2024cited by this paper
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
2024cited by this paper
StarCoder 2 and The Stack v2: The Next Generation
2024cited by this paper
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
2024cited by this paper
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
2024cited by this paper
LoRA+: Efficient Low Rank Adaptation of Large Models
2024cited by this paper
DoRA: Weight-Decomposed Low-Rank Adaptation
2024influential reference
A Closer Look at the Limitations of Instruction Tuning
2024cited by this paper
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
2024cited by this paper
DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence
2024cited by this paper
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
2024cited by this paper
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning
2024cited by this paper
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
2024cited by this paper
VeRA: Vector-based Random Matrix Adaptation
2023cited by this paper
SantaCoder: don't reach for the stars!
2023cited by this paper
StarCoder: may the source be with you!
2023influential reference
QLoRA: Efficient Finetuning of Quantized LLMs
2023influential reference
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
2023cited by this paper
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
2023cited by this paper
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
2023cited by this paper
Llama 2: Open Foundation and Fine-Tuned Chat Models
2023influential reference
OctoPack: Instruction Tuning Code Large Language Models
2023cited by this paper
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
2023cited by this paper
Code Llama: Open Foundation Models for Code
2023cited by this paper
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
2023influential reference
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
2023influential reference
Qwen Technical Report
2023cited by this paper
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
2023influential reference
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
2023cited by this paper
The Expressive Power of Low-Rank Adaptation
2023cited by this paper
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
2023influential reference
Magicoder: Source Code Is All You Need
2023influential reference
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
2023cited by this paper
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
2023cited by this paper
Scaling Instruction-Finetuned Language Models
2022cited by this paper
Large Language Models are Better Reasoners with Self-Verification
2022cited by this paper
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
2022influential reference
Fine-tuned Language Models are Continual Learners
2022cited by this paper
Challenging Common Assumptions about Catastrophic Forgetting and Knowledge Accumulation
2022cited by this paper
Finetuned Language Models Are Zero-Shot Learners
2021cited by this paper
Training Verifiers to Solve Math Word Problems
2021influential reference
Measuring Mathematical Problem Solving With the MATH Dataset
2021influential reference
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
LoRA: Low-Rank Adaptation of Large Language Models
2021influential reference
Evaluating Large Language Models Trained on Code
2021influential reference
Measuring Massive Multitask Language Understanding
2020influential reference
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
2020cited by this paper
GLU Variants Improve Transformer
2020cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
An Adversarial Winograd Schema Challenge at Scale
2019cited by this paper
HellaSwag: Can a Machine Really Finish Your Sentence?
2019cited by this paper
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
2019cited by this paper
LAMOL: LAnguage MOdeling for Lifelong Language Learning
2019cited by this paper
ZeRO: Memory optimizations Toward Training Trillion Parameter Models
2019influential reference
Measuring the Intrinsic Dimension of Objective Landscapes
2018cited by this paper
Searching for Activation Functions
2018cited by this paper
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2018cited by this paper
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
2018cited by this paper
The E2E Dataset: New Challenges For End-to-End Generation
2017cited by this paper
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
2017cited by this paper
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
2017influential reference
Deep Learning
2016cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013cited by this paper
An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks
2013cited by this paper
Catastrophic forgetting in connectionist networks.
1999cited by this paper
Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem
1989cited by this paper
Predicting Task Forgetting in Large Language Models
year unknowncited by this paper
A Study on Improving Reasoning in Language Models
year unknowninfluential reference

CITED BY

On the Evidentiary Limits of Membership Inference for Copyright Auditing
2026cites this paper
Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection
2026cites this paper
MARS: Harmonizing Multimodal Convergence via Adaptive Rank Search
2026cites this paper
Slot-ID: Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding
2026cites this paper
FlashOptim: Optimizers for Memory Efficient Training
2026cites this paper
The Forecast After the Forecast: A Post-Processing Shift in Time Series
2026cites this paper
Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning
2026cites this paper
TimeWarp: Evaluating Web Agents by Revisiting the Past
2026cites this paper
Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models
2026cites this paper
Artificial Entanglement in the Fine-Tuning of Large Language Models
2026cites this paper
Diving into Kronecker Adapters: Component Design Matters
2026cites this paper
Quantization-Robust LLM Unlearning via Low-Rank Adaptation
2026cites this paper
Design of conditional control generation based on regional feature quantification: practical investigation of diffusion models in developed urban areas
2026cites this paper
Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models
2026cites this paper
A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation
2026influential citation
Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning
2026influential citation
Zeroth-Order Federated Fine-Tuning for Large AI Models in Resource-Constrained Wireless Networks
2026cites this paper
Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
2026cites this paper
The Appeal and Reality of Recycling LoRAs with Adaptive Merging
2026cites this paper
Robust Policy Optimization to Prevent Catastrophic Forgetting
2026cites this paper
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
2026cites this paper
Privacy Enhanced PEFT: Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD
2026cites this paper
Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
2026cites this paper
Why LoRA Resists Label Noise: A Theoretical Framework for Noise-Robust Parameter-Efficient Fine-Tuning
2026cites this paper
Modular Multi-Task Learning for Chemical Reaction Prediction
2026cites this paper
Efficient Hyper-Parameter Search for LoRA via Language-aided Bayesian Optimization
2026cites this paper
Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation
2026influential citation
VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval
2026cites this paper
Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA
2026cites this paper
Least but not Last: Fine-tuning Intermediate Principal Components for Better Performance-Forgetting Trade-Offs
2026cites this paper
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
2025cites this paper
Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks
2025cites this paper
Resource-Limited Joint Multimodal Sentiment Reasoning and Classification via Chain-of-Thought Enhancement and Distillation
2025cites this paper
A Practical Investigation of Spatially-Controlled Image Generation with Transformers
2025cites this paper
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation
2025cites this paper
Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes
2025cites this paper
ChaTCL: LLM-Based Multi-Agent RAG Framework for TCL Script Generation
2025cites this paper
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
2025cites this paper
Continual Gradient Low-Rank Projection Fine-Tuning for LLMs
2025cites this paper
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
2025cites this paper
ReCode: Updating Code API Knowledge with Reinforcement Learning
2025cites this paper
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
2025cites this paper
Pay Attention to Small Weights
2025cites this paper
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
2025cites this paper
Comparing Knowledge Injection Methods for LLMs in a Low-Resource Regime
2025cites this paper
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
2025cites this paper
PET-MAD, a lightweight universal interatomic potential for advanced materials modeling
2025cites this paper
Improving LoRA with Variational Learning
2025cites this paper
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
2025cites this paper
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
2025cites this paper
Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning
2025cites this paper
RQT: Hierarchical Residual Quantization for Multi-Model Compression
2025cites this paper
The Primacy of Magnitude in Low-Rank Adaptation
2025influential citation
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes
2025cites this paper
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
2025cites this paper
Can Large Language Models Automate the Refinement of Cellular Network Specifications?
2025cites this paper
Bayesian BiLO: Bilevel Local Operator Learning for Efficient Uncertainty Quantification of Bayesian PDE Inverse Problems with Low-Rank Adaptation
2025cites this paper
DMoLE: Dynamic Mixture of LoRA Experts for Spam Email Detection
2025cites this paper
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
2025cites this paper
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation
2025cites this paper
Chitranuvad: Adapting Multi-lingual LLMs for Multimodal Translation
2025cites this paper
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
2025cites this paper
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
2025cites this paper
Continual Learning in Vision-Language Models via Aligned Model Merging
2025cites this paper
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
2025cites this paper
Weight Spectra Induced Efficient Model Adaptation
2025cites this paper
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
2025cites this paper
Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving
2025cites this paper
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
2025cites this paper
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
2025cites this paper
Ontology-conformal recognition of materials entities using language models
2025influential citation
Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting
2025cites this paper
Context-Free Synthetic Data Mitigates Forgetting
2025influential citation
AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
2025cites this paper
MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning
2025cites this paper
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks
2025influential citation
Norm-Bounded Low-Rank Adaptation
2025cites this paper
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
2025cites this paper
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
2025influential citation
HRP: High-Rank Preheating for Superior LoRA Initialization
2025cites this paper
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
2025cites this paper
Multi-Attribute Steering of Language Models via Targeted Intervention
2025cites this paper
GoRA: Gradient-driven Adaptive Low Rank Adaptation
2025cites this paper
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
2025cites this paper
Parameter-Efficient Online Fine-Tuning of ML-Based Hybrid Beamforming With LoRA
2025cites this paper
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
2025cites this paper
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
2025cites this paper
Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing
2025cites this paper
Towards hyperparameter-free optimization with differential privacy
2025cites this paper
Privacy and Accuracy-Aware AI/ML Model Deduplication
2025cites this paper
Put the Space of LoRA Initialization to the Extreme to Preserve Pre-trained Knowledge
2025cites this paper
Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL
2025cites this paper
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
2025cites this paper
From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning
2025cites this paper
KSOD: Knowledge Supplement for LLMs On Demand
2025cites this paper
Enhancing Large Language Models on Domain-specific Tasks: A Novel Training Strategy via Domain Adaptation and Preference Alignment
2025cites this paper
Adapt and Feature Translation for Class-Incremental Learning with Pre-Trained Models
2025cites this paper
RaSA: Rank-Sharing Low-Rank Adaptation
2025influential citation
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
2025cites this paper
ExpertSteer: Intervening in LLMs through Expert Knowledge
2025cites this paper