AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections

Xin Yu,Yujia Wang,Jinghui Chen,Lingzhou Xue

Published 2025 in arXiv.org

ABSTRACT

Low-Rank Adaptation (LoRA) has emerged as an effective technique for reducing memory overhead in fine-tuning large language models. However, it often suffers from sub-optimal performance compared with full fine-tuning since the update is constrained in the low-rank space. Recent variants such as LoRA-Pro attempt to mitigate this by adjusting the gradients of the low-rank matrices to approximate the full gradient. However, LoRA-Pro's solution is not unique, and different solutions can lead to significantly varying performance in ablation studies. Besides, to incorporate momentum or adaptive optimization design, approaches like LoRA-Pro must first compute the equivalent gradient, causing a higher memory cost close to full fine-tuning. A key challenge remains in integrating momentum properly into the low-rank space with lower memory cost. In this work, we propose AltLoRA, an alternating projection method that avoids the difficulties in gradient approximation brought by the joint update design, meanwhile integrating momentum without higher memory complexity. Our theoretical analysis provides convergence guarantees and further shows that AltLoRA enables stable feature learning and robustness to transformation invariance. Extensive experiments across multiple tasks demonstrate that AltLoRA outperforms LoRA and its variants, narrowing the gap toward full fine-tuning while preserving superior memory efficiency.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-05-18
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2505.12455 arXiv 2505.12455
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Efficient Over-parameterized Matrix Sensing from Noisy Measurements via Alternating Preconditioned Gradient Descent
2025cited by this paper
LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
2025cited by this paper
Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization
2025cited by this paper
HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models
2025cited by this paper
Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL
2025cited by this paper
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
2025cited by this paper
Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA
2025cited by this paper
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures
2024cited by this paper
Parameter Efficient Reinforcement Learning from Human Feedback
2024cited by this paper
FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations
2024cited by this paper
LoRA+: Efficient Low Rank Adaptation of Large Models
2024influential reference
The Llama 3 Herd of Models
2024cited by this paper
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
2024influential reference
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
2024influential reference
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
2024cited by this paper
The Impact of Initialization on LoRA Finetuning Dynamics
2024influential reference
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
2024cited by this paper
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts
2024cited by this paper
Mixture of LoRA Experts
2024influential reference
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning
2024cited by this paper
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
2024cited by this paper
Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models
2024influential reference
DoRA: Weight-Decomposed Low-Rank Adaptation
2024influential reference
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
2024cited by this paper
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
2024cited by this paper
Improving LoRA in Privacy-preserving Federated Learning
2024cited by this paper
Accelerating Gradient Descent for Over-Parameterized Asymmetric Low-Rank Matrix Sensing via Preconditioning
2024cited by this paper
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
2024cited by this paper
Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources
2024cited by this paper
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
2024cited by this paper
GaLore+: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
2024cited by this paper
DeepSeek-V3 Technical Report
2024cited by this paper
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
2024influential reference
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
2024influential reference
Subspace Optimization for Large Language Models with Convergence Guarantees
2024cited by this paper
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
2023cited by this paper
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
2023cited by this paper
LLaMA: Open and Efficient Foundation Language Models
2023cited by this paper
Parameter-efficient fine-tuning of large-scale pre-trained language models
2023cited by this paper
GPT-4 Technical Report
2023cited by this paper
Segment Anything
2023cited by this paper
WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions
2023cited by this paper
Cuttlefish: Low-Rank Model Training without All the Tuning
2023cited by this paper
QLoRA: Efficient Finetuning of Quantized LLMs
2023cited by this paper
Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent
2023cited by this paper
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
2023cited by this paper
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
2023cited by this paper
Matrix Factorization Techniques in Machine Learning, Signal Processing, and Statistics
2023cited by this paper
Llama 2: Open Foundation and Fine-Tuned Chat Models
2023cited by this paper
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
2023cited by this paper
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning
2023cited by this paper
Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices
2023cited by this paper
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
2023cited by this paper
VeRA: Vector-based Random Matrix Adaptation
2023cited by this paper
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
2023cited by this paper
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
2023cited by this paper
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
2023cited by this paper
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
2023influential reference
Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent
2023cited by this paper
DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
2022cited by this paper
Exploring Low Rank Training of Deep Neural Networks
2022cited by this paper
Training Verifiers to Solve Math Word Problems
2021cited by this paper
FedPara: Low-rank Hadamard Product for Communication-Efficient Federated Learning
2021cited by this paper
Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks
2021cited by this paper
Evaluating Large Language Models Trained on Code
2021cited by this paper
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
High-Resolution Image Synthesis with Latent Diffusion Models
2021cited by this paper
Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent
2020cited by this paper
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
2020cited by this paper
Feature Learning in Infinite-Width Neural Networks
2020cited by this paper
Low-Rank Matrix Recovery with Scaled Subgradient Methods: Fast and Robust Convergence Without the Condition Number
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
2020cited by this paper
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks
2020cited by this paper
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
2019cited by this paper
Parameter-Efficient Transfer Learning for NLP
2019cited by this paper
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019cited by this paper
HuggingFace's Transformers: State-of-the-art Natural Language Processing
2019cited by this paper
On the Impact of the Activation Function on Deep Neural Networks Training
2019cited by this paper
Measuring the Intrinsic Dimension of Objective Landscapes
2018cited by this paper
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2018cited by this paper
Deep Information Propagation
2016cited by this paper
Riemannian Preconditioning
2014cited by this paper
Low-rank matrix completion using alternating minimization
2012cited by this paper
A Riemannian geometry for low-rank matrix completion
2012cited by this paper
The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory
2011cited by this paper
Estimation of high-dimensional low-rank matrices
2009cited by this paper
Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization
2007cited by this paper

CITED BY

Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA
2026cites this paper
ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations
2026cites this paper
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
2025cites this paper
ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning
2025cites this paper