Active Learning for Vision-Language Models

Published 2024 in IEEE Workshop/Winter Conference on Applications of Computer Vision

ABSTRACT

Pre-trained vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot performance on a wide range of downstream computer vision tasks. However, there still exists a considerable performance gap between these models and a supervised deep model trained on a downstream dataset. To bridge this gap, we propose a novel active learning (AL) framework that enhances the zero-shot classification performance of VLMs by selecting only a few informative samples from the unlabeled data for annotation during training. To achieve this, our approach first cali-brates the predicted entropy of VLMs and then utilizes a combination of self-uncertainty and neighbor-aware uncer-tainty to calculate a reliable uncertainty measure for active sample selection. Our extensive experiments show that the proposed approach outperforms existing AL approaches on several image classification datasets, and significantly en-hances the zero-shot performance of VLMs.

PUBLICATION RECORD

Publication year
2024
Venue
IEEE Workshop/Winter Conference on Applications of Computer Vision
Publication date
2024-10-29
Fields of study
Computer Science
Identifiers
DOI 10.1109/WACV61041.2025.00480 arXiv 2410.22187
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Active Prompt Learning in Vision Language Models
2023cited by this paper
Adversarial Prompt Tuning for Vision-Language Models
2023cited by this paper
ActiveDC: Distribution Calibration for Active Finetuning
2023cited by this paper
Investigating the Limitation of CLIP Models: The Worst-Performing Categories
2023cited by this paper
Towards Free Data Selection with General-Purpose Models
2023cited by this paper
Entropic Open-set Active Learning
2023cited by this paper
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
2023cited by this paper
Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
2023cited by this paper
Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models
2023cited by this paper
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
2023cited by this paper
VLP: A Survey on Vision-language Pre-training
2022cited by this paper
Ontology-enhanced Prompt-tuning for Few-shot Learning
2022cited by this paper
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022cited by this paper
Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach
2022cited by this paper
Active learning by query by committee with robust divergences
2022cited by this paper
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
2022cited by this paper
MaPLe: Multi-modal Prompt Learning
2022influential reference
Prompt-aligned Gradient for Prompt Tuning
2022cited by this paper
Prompt Distribution Learning
2022cited by this paper
Visual Prompt Tuning
2022influential reference
Active Learning by Feature Mixing
2022influential reference
Conditional Prompt Learning for Vision-Language Models
2022cited by this paper
SLIP: Self-supervision meets Language-Image Pre-training
2021cited by this paper
Making Pre-trained Language Models Better Few-shot Learners
2021cited by this paper
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
2021cited by this paper
Calibrate Before Use: Improving Few-Shot Performance of Language Models
2021cited by this paper
Aligning Pretraining for Detection via Object-Level Contrastive Learning
2021cited by this paper
VinVL: Revisiting Visual Representations in Vision-Language Models
2021cited by this paper
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification
2021cited by this paper
Learning to Prompt for Vision-Language Models
2021influential reference
Transfer learning for image classification using VGG19: Caltech-101 image data set
2021cited by this paper
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
2021cited by this paper
FLAVA: A Foundational Language And Vision Alignment Model
2021cited by this paper
Image Segmentation Using Text and Image Prompts
2021cited by this paper
Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
Eliciting Knowledge from Language Models Using Automatically Generated Prompts
2020cited by this paper
A Survey of Deep Active Learning
2020cited by this paper
Contextual Diversity for Active Learning
2020cited by this paper
Sequential Graph Convolutional Network for Active Learning
2020cited by this paper
VirTex: Learning Visual Representations from Textual Annotations
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
A Simple Framework for Contrastive Learning of Visual Representations
2020cited by this paper
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
2019cited by this paper
Active Learning for Convolutional Neural Networks: A Core-Set Approach
2017cited by this paper
Improving Generalization Performance by Switching from Adam to SGD
2017cited by this paper
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
2017cited by this paper
Bag of Tricks for Efficient Text Classification
2016cited by this paper
Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization
2015cited by this paper
Learning Visual Features from Large Weakly Supervised Data
2015cited by this paper
Submodularity in Data Subset Selection and Active Learning
2015cited by this paper
A new active labeling method for deep learning
2014cited by this paper
Fine-Grained Visual Classification of Aircraft
2013cited by this paper
Describing Textures in the Wild
2013cited by this paper
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
2012cited by this paper
Robust classification of objects, faces, and flowers using natural image statistics
2010cited by this paper
Margin Based Active Learning
2007cited by this paper
Automated variable weighting in k-means type clustering
2005cited by this paper
Estimating mutual information.
2003cited by this paper
Selective Sampling Using the Query by Committee Algorithm
1997cited by this paper
General properties of entropy
1978cited by this paper

CITED BY

Uncertainty and Diversity Based Selection for Active Learning in Vision-Language Models
2026cites this paper
Similarity-as-Evidence: Calibrating Overconfident VLMs for Interpretable and Label-Efficient Medical Active Learning
2026cites this paper
Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning
2026influential citation
Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration
2026cites this paper
PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models
2025cites this paper
Active Learning via Vision-Language Model Adaptation with Open Data
2025influential citation
Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization
2025cites this paper
Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
2025cites this paper
StepAL: Step-aware Active Learning for Cataract Surgical Videos
2025cites this paper
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
2025cites this paper
OViP: Online Vision-Language Preference Learning
2025cites this paper
Certainty and Uncertainty Guided Active Domain Adaptation
2025cites this paper
Benchmarking VLMs' Reasoning About Persuasive Atypical Images
2024cites this paper
Post-hoc Probabilistic Vision-Language Models
2024cites this paper