Reinforced Curriculum Pre-Alignment for Domain-Adaptive VLMs

Yuming Yan,Shuo Yang,Kai Tang,Sihong Chen,Yang Zhang,Ke Xu,D. Hu,Qun Yu,Pengfei Hu,Edith C. H. Ngai

Published 2026 in Unknown venue

ABSTRACT

Vision-Language Models (VLMs) demonstrate remarkable general-purpose capabilities but often fall short in specialized domains such as medical imaging or geometric problem-solving. Supervised Fine-Tuning (SFT) can enhance performance within a target domain, but it typically causes catastrophic forgetting, limiting its generalization. The central challenge, therefore, is to adapt VLMs to new domains while preserving their general-purpose capabilities. Continual pretraining is effective for expanding knowledge in Large Language Models (LLMs), but it is less feasible for VLMs due to prohibitive computational costs and the unavailability of pretraining data for most open-source models. This necessitates efficient post-training adaptation methods. Reinforcement learning (RL)-based approaches such as Group Relative Policy Optimization (GRPO) have shown promise in preserving general abilities, yet they often fail in domain adaptation scenarios where the model initially lacks sufficient domain knowledge, leading to optimization collapse. To bridge this gap, we propose Reinforced Curriculum Pre-Alignment (RCPA), a novel post-training paradigm that introduces a curriculum-aware progressive modulation mechanism. In the early phase, RCPA applies partial output constraints to safely expose the model to new domain concepts. As the model's domain familiarity increases, training gradually transitions to full generation optimization, refining responses and aligning them with domain-specific preferences. This staged adaptation balances domain knowledge acquisition with the preservation of general multimodal capabilities. Extensive experiments across specialized domains and general benchmarks validate the effectiveness of RCPA, establishing a practical pathway toward building high-performing and domain-adaptive VLMs.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-02-11
Fields of study
Computer Science
Identifiers
arXiv 2602.10740
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Reinforcement Learning from Human Feedback
2025cited by this paper
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
2025cited by this paper
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
2025cited by this paper
Qwen2.5-VL Technical Report
2025influential reference
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
2025cited by this paper
Scalable Vision Language Model Training via High Quality Data Curation
2025cited by this paper
RL's Razor: Why Online Reinforcement Learning Forgets Less
2025cited by this paper
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
2025influential reference
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
2025cited by this paper
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
2025cited by this paper
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
2025cited by this paper
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
2025cited by this paper
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance
2024cited by this paper
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
2024influential reference
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
2024influential reference
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
2024cited by this paper
A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability
2024cited by this paper
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
2024cited by this paper
Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Models
2024cited by this paper
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
2024cited by this paper
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
2024cited by this paper
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
2024cited by this paper
Instruction-Following Evaluation for Large Language Models
2023influential reference
Fine-tuning Strategies for Domain Specific Question Answering under Low Annotation Budget Constraints
2023cited by this paper
Towards Stable Test-Time Adaptation in Dynamic Wild World
2023cited by this paper
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
2023cited by this paper
Visual Instruction Tuning
2023cited by this paper
QLoRA: Efficient Finetuning of Quantized LLMs
2023cited by this paper
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023cited by this paper
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
2023influential reference
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
2023influential reference
Improved Baselines with Visual Instruction Tuning
2023cited by this paper
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
2023cited by this paper
Scaling Instruction-Finetuned Language Models
2022cited by this paper
FLAVA: A Foundational Language And Vision Alignment Model
2021cited by this paper
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
2021cited by this paper
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2021cited by this paper
Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text
2021cited by this paper
Unsupervised Cross-lingual Representation Learning at Scale
2019cited by this paper
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019cited by this paper
Proximal Policy Optimization Algorithms
2017cited by this paper
SPICE: Semantic Propositional Image Caption Evaluation
2016influential reference
Overcoming catastrophic forgetting in neural networks
2016cited by this paper
iCaRL: Incremental Classifier and Representation Learning
2016cited by this paper
Learning without Forgetting
2016cited by this paper
Microsoft COCO Captions: Data Collection and Evaluation Server
2015influential reference
Design and Development of a Multimodal Biomedical Information Retrieval System
2012cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004influential reference
IBM Research Report Bleu: a Method for Automatic Evaluation of Machine Translation
2001influential reference

CITED BY

No citing papers are available for this paper.