PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization

Published 2024 in Neural Information Processing Systems

ABSTRACT

Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained transformers to downstream tasks. However, the optimization of tasks performance often comes at the cost of generalizability in fine-tuned models. To address this issue, we theoretically connect smaller weight gradient norms during training and larger datasets to the improvements in model generalization. Motivated by this connection, we propose reducing gradient norms for enhanced generalization and aligning fine-tuned model with the pre-trained counterpart to retain knowledge from large-scale pre-training data. Yet, naive alignment does not guarantee gradient reduction and can potentially cause gradient explosion, complicating efforts to manage gradients. To address such an issue, we propose PACE, marrying generalization of PArameter-efficient fine-tuning with Consistency rEgularization. We perturb features learned from the adapter with the multiplicative noise and ensure the fine-tuned model remains consistent for same sample under different perturbations. Theoretical analysis shows that PACE not only implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge. Experimental evidence supports our theories. PACE surpasses existing PEFT methods in visual adaptation tasks (VTAB-1k, FGVC, few-shot learning, domain adaptation) showcasing its potential for resource-efficient fine-tuning. It also improves LoRA in text classification (GLUE) and mathematical reasoning (GSM-8K). The code is available at https://github.com/MaxwellYaoNi/PACE

PUBLICATION RECORD

Publication year
2024
Venue
Neural Information Processing Systems
Publication date
2024-09-25
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2409.17137 arXiv 2409.17137
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Stabilizing Modality Gap & Lowering Gradient Norms Improve Zero-Shot Adversarial Robustness of VLMs
2025cited by this paper
$\bigcirc\!\!\!\!\bigcirc$ CHAIN: Enhancing Generalization in Data-Efficient GANs via LipsCHitz Continuity ConstrAIned Normalization
2024cited by this paper
Semantic Transfer from Head to Tail: Enlarging Tail Margin for Long-Tailed Visual Recognition
2024cited by this paper
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
2024cited by this paper
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
2024influential reference
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
2023cited by this paper
AdapterGNN: Parameter-Efficient Fine-Tuning Improves Generalization in GNNs
2023cited by this paper
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
2023cited by this paper
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
2023cited by this paper
Fast Trainable Projection for Robust Fine-Tuning
2023cited by this paper
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
2023influential reference
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
2023cited by this paper
Consistency-guided Prompt Learning for Vision-Language Models
2023cited by this paper
Universality and Limitations of Prompt Tuning
2023cited by this paper
QLoRA: Efficient Finetuning of Quantized LLMs
2023cited by this paper
Segment Anything
2023cited by this paper
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
2023cited by this paper
LLaMA: Open and Efficient Foundation Language Models
2023cited by this paper
Towards Efficient Visual Adaption via Structural Re-parameterization
2023cited by this paper
NICE: NoIse-modulated Consistency rEgularization for Data-Efficient GANs
2023cited by this paper
Batched Low-Rank Adaptation of Foundation Models
2023cited by this paper
VioLET: Vision-Language Efficient Tuning with Collaborative Multi-modal Gradients
2023cited by this paper
VeRA: Vector-based Random Matrix Adaptation
2023cited by this paper
Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing
2023influential reference
DropKey for Vision Transformer
2023influential reference
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
2023cited by this paper
Neural Prompt Search
2022influential reference
LAION-5B: An open large-scale dataset for training next generation image-text models
2022cited by this paper
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
2022influential reference
On the Effectiveness of Parameter-Efficient Fine-Tuning
2022cited by this paper
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
2022influential reference
Sparse Structure Search for Delta Tuning
2022cited by this paper
Visual Prompt Tuning
2022influential reference
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
2022cited by this paper
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022cited by this paper
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
2022cited by this paper
FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer
2022cited by this paper
High-Resolution Image Synthesis with Latent Diffusion Models
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021cited by this paper
Emerging Properties in Self-Supervised Vision Transformers
2021cited by this paper
SWAD: Domain Generalization by Seeking Flat Minima
2021cited by this paper
LoRA: Low-Rank Adaptation of Large Language Models
2021influential reference
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
2021cited by this paper
R-Drop: Regularized Dropout for Neural Networks
2021cited by this paper
Training Verifiers to Solve Math Word Problems
2021cited by this paper
Improved Regularization and Robustness for Fine-tuning in Neural Networks
2021cited by this paper
Masked Autoencoders Are Scalable Vision Learners
2021cited by this paper
Manifold Learning Benefits GANs
2021cited by this paper
On Transferability of Prompt Tuning for Natural Language Processing
2021cited by this paper
Adversarial Weight Perturbation Helps Robust Generalization
2020cited by this paper
How Does Mixup Help With Robustness and Generalization?
2020cited by this paper
Sharpness-Aware Minimization for Efficiently Improving Generalization
2020influential reference
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
2020cited by this paper
Defending and Harnessing the Bit-Flip Based Adversarial Weight Attack
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020influential reference
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
2020cited by this paper
Do ImageNet Classifiers Generalize to ImageNet?
2019cited by this paper
DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks
2019cited by this paper
Parameter-Efficient Transfer Learning for NLP
2019cited by this paper
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
2019cited by this paper
Momentum Contrast for Unsupervised Visual Representation Learning
2019cited by this paper
Consistency Regularization for Generative Adversarial Networks
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
Natural Adversarial Examples
2019cited by this paper
Learning Robust Global Representations by Penalizing Local Predictive Power
2019cited by this paper
CAGAN: Consistent Adversarial Training Enhanced GANs
2018cited by this paper
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
2018cited by this paper
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2018cited by this paper
Gradient Regularization Improves Accuracy of Discriminative Models
2017cited by this paper
Attention is All you Need
2017cited by this paper
Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection
2015cited by this paper
Food-101 - Mining Discriminative Components with Random Forests
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
3D Object Representations for Fine-Grained Categorization
2013cited by this paper
Fine-Grained Visual Classification of Aircraft
2013cited by this paper
What regularized auto-encoders learn from the data-generating distribution
2012cited by this paper
ImageNet: A large-scale hierarchical image database
2009influential reference
A Visual Vocabulary for Flower Classification
2006cited by this paper
Care of aged doctors
1969cited by this paper

CITED BY

Task Knowledge Injection via Interpolations and Reinstatement for Large Language Model Generalization
2025influential citation
Prompt-Enabled Large AI Models for CSI Feedback
2025cites this paper
Stabilizing Modality Gap & Lowering Gradient Norms Improve Zero-Shot Adversarial Robustness of VLMs
2025cites this paper
Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts
2025cites this paper
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
2025cites this paper
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
2025influential citation
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
2025cites this paper
Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
2025influential citation
LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
2025cites this paper
QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation
2025cites this paper
BiLoRA: Almost-orthogonal Parameter Spaces for Continual Learning
2025cites this paper
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization
2024cites this paper
E NHANCING R OBUSTNESS OF V ISION -L ANGUAGE M ODELS THROUGH O RTHOGONALITY L EARNING AND S ELF -R EGULARIZATION
year unknowncites this paper
CrossSpectra : Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning
year unknowncites this paper