Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Shiyin Dong,Mingrui Zhu,N. Wang,Heng Yang,Xinbo Gao

Published 2023 in arXiv.org

ABSTRACT

Zero-shot sketch-based image retrieval (ZS-SBIR) is challenging due to the cross-domain nature of sketches and photos, as well as the semantic gap between seen and unseen image distributions. Previous methods fine-tune pre-trained models with various side information and learning strategies to learn a compact feature space that is shared between the sketch and photo domains and bridges seen and unseen classes. However, these efforts are inadequate in adapting domains and transferring knowledge from seen to unseen classes. In this paper, we present an effective ``Adapt and Align'' approach to address the key challenges. Specifically, we insert simple and lightweight domain adapters to learn new abstract concepts of the sketch domain and improve cross-domain representation capabilities. Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes. Extensive experiments on three benchmark datasets and two popular backbones demonstrate the superiority of our method in terms of retrieval accuracy and flexibility.

PUBLICATION RECORD

Publication year
2023
Venue
arXiv.org
Publication date
2023-05-09
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2305.05144 arXiv 2305.05144
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
2023cited by this paper
AIM: Adapting Image Models for Efficient Video Action Recognition
2023cited by this paper
Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
2023cited by this paper
Black-Box Tuning for Language-Model-as-a-Service
2022cited by this paper
Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval
2022cited by this paper
TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval
2022influential reference
Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval
2022cited by this paper
Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval
2022cited by this paper
Visual Prompt Tuning
2022cited by this paper
Sketch3T: Test-Time Training for Zero-Shot SBIR
2022cited by this paper
GroupViT: Semantic Segmentation Emerges from Text Supervision
2022cited by this paper
Language-driven Semantic Segmentation
2022cited by this paper
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
2021cited by this paper
ActionCLIP: A New Paradigm for Video Action Recognition
2021cited by this paper
Learning to Prompt for Vision-Language Models
2021cited by this paper
Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval
2021cited by this paper
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
2021cited by this paper
Audioclip: Extending Clip to Image, Text and Audio
2021cited by this paper
Adaptive and Generative Zero-Shot Learning
2021cited by this paper
Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval
2021cited by this paper
LoRA: Low-Rank Adaptation of Large Language Models
2021cited by this paper
Emerging Properties in Self-Supervised Vision Transformers
2021cited by this paper
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
2021cited by this paper
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
2021cited by this paper
More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval
2021cited by this paper
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
2021cited by this paper
StyleGuide: Zero-Shot Sketch-Based Image Retrieval Using Style-Guided Image Generation
2021influential reference
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
An Efficient Framework for Zero-Shot Sketch-Based Image Retrieval
2021cited by this paper
Grounded Language-Image Pre-training
2021cited by this paper
PointCLIP: Point Cloud Understanding by CLIP
2021cited by this paper
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
2021cited by this paper
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
2021cited by this paper
Unsupervised Domain Adaptation with Adapter
2021cited by this paper
Transferable Coupled Network for Zero-Shot Sketch-Based Image Retrieval
2021cited by this paper
Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval
2021cited by this paper
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
2021cited by this paper
Training data-efficient image transformers & distillation through attention
2020cited by this paper
Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval
2020influential reference
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
2020cited by this paper
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
2020cited by this paper
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
2019cited by this paper
Parameter-Efficient Transfer Learning for NLP
2019cited by this paper
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval
2019influential reference
F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
2019cited by this paper
Transferable Contrastive Network for Generalized Zero-Shot Learning
2019cited by this paper
Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval
2019cited by this paper
Zero-Shot Sketch-Image Hashing
2018cited by this paper
Learning Large Euclidean Margin for Sketch-based Image Retrieval
2018cited by this paper
Generative Domain-Migration Hashing for Sketch-to-Image Retrieval
2018cited by this paper
A Zero-Shot Framework for Sketch-based Image Retrieval
2018cited by this paper
Feature Generating Networks for Zero-Shot Learning
2017cited by this paper
Attention is All you Need
2017cited by this paper
Zero-Shot Learning via Class-Conditioned Deep Generative Models
2017cited by this paper
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
2017cited by this paper
Semantic Autoencoder for Zero-Shot Learning
2017cited by this paper
Generalized Zero-Shot Learning via Synthesized Examples
2017cited by this paper
Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval
2017cited by this paper
Sketch Me That Shoe
2016cited by this paper
SketchNet: Sketch Classification with Web Images
2016cited by this paper
Sketch-based image retrieval via Siamese convolutional neural network
2016cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Zero-Shot Learning via Semantic Similarity Embedding
2015cited by this paper
Unsupervised Domain Adaptation by Backpropagation
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014influential reference
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Label-Embedding for Attribute-Based Classification
2013cited by this paper
Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval
2013cited by this paper
Sketch-based image retrieval on a large scale database
2012cited by this paper
How do humans sketch objects?
2012influential reference
Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors
2011cited by this paper
Learning to detect unseen object classes by between-class attribute transfer
2009cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
WordNet: A Lexical Database for English
1995cited by this paper
Attribute-Based Classification for Zero-Shot Visual Object Categorization
year unknowncited by this paper

CITED BY

Cross-Domain Matrix Compression Adaptation for Zero-Shot Sketch-Based Image Retrieval
2025cites this paper
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
2024cites this paper
Zero-Shot Sketch Based Image Retrieval via Modality Capacity Guidance
2024cites this paper
Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval
2023cites this paper