Efficient and Effective In-context Demonstration Selection with Coreset

Zihua Wang,Jiarui Wang,Haiyang Xu,Ming Yan,Fei Huang,Xu Yang,Xiu-Shen Wei,Siya Mi,Yu Zhang

Published 2025 in arXiv.org

ABSTRACT

In-context learning (ICL) has emerged as a powerful paradigm for Large Visual Language Models (LVLMs), enabling them to leverage a few examples directly from input contexts. However, the effectiveness of this approach is heavily reliant on the selection of demonstrations, a process that is NP-hard. Traditional strategies, including random, similarity-based sampling and infoscore-based sampling, often lead to inefficiencies or suboptimal performance, struggling to balance both efficiency and effectiveness in demonstration selection. In this paper, we propose a novel demonstration selection framework named Coreset-based Dual Retrieval (CoDR). We show that samples within a diverse subset achieve a higher expected mutual information. To implement this, we introduce a cluster-pruning method to construct a diverse coreset that aligns more effectively with the query while maintaining diversity. Additionally, we develop a dual retrieval mechanism that enhances the selection process by achieving global demonstration selection while preserving efficiency. Experimental results demonstrate that our method significantly improves the ICL performance compared to the existing strategies, providing a robust solution for effective and efficient demonstration selection.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-11-12
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2511.08977 arXiv 2511.08977
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

No references are available for this paper.

CITED BY

A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances
2025cites this paper