Zero-Shot Learning Through Cross-Modal Transfer

R. Socher,M. Ganjoo,Christopher D. Manning,A. Ng

Published 2013 in Neural Information Processing Systems

ABSTRACT

This work introduces a model that can recognize objects in images even if no training data is available for the object class. The only necessary knowledge about unseen visual categories comes from unsupervised text corpora. Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of seen and unseen classes, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes. This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like. Our deep learning model does not require any manually defined semantic or visual features for either words or images. Images are mapped to be close to semantic word vectors corresponding to their classes, and the resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class. We then use novelty detection methods to differentiate unseen classes from seen classes. We demonstrate two novelty detection strategies; the first gives high accuracy on unseen classes, while the second is conservative in its prediction of novelty and keeps the seen classes' accuracy high.

PUBLICATION RECORD

Publication year
2013
Venue
Neural Information Processing Systems
Publication date
2013-01-16
Fields of study
Computer Science
Identifiers
arXiv 1301.3666
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Distributional Semantics in Technicolor
2012cited by this paper
Beyond spatial pyramids: Receptive field learning for pooled image features
2012cited by this paper
Improving Word Representations via Global Context and Multiple Word Prototypes
2012cited by this paper
Multimodal learning with deep Boltzmann machines
2012cited by this paper
Multimodal Deep Learning
2011cited by this paper
Learning to Learn with Compound HD Models
2011cited by this paper
One shot learning of simple visual concepts
2011cited by this paper
The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization
2011influential reference
Going Beyond Text: A Hybrid Image-Text Approach for Measuring Word Relatedness
2011cited by this paper
Towards cross-category knowledge propagation for learning visual concepts
2011cited by this paper
Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach
2011cited by this paper
Visual Information in Semantic Representation
2010cited by this paper
Distributional Memory: A General Framework for Corpus-Based Semantics
2010cited by this paper
Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
2010cited by this paper
From Frequency to Meaning: Vector Space Models of Semantics
2010cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
Zero-shot Learning with Semantic Output Codes
2009cited by this paper
Learning to detect unseen object classes by between-class attribute transfer
2009influential reference
LoOP: local outlier probabilities
2009cited by this paper
Describing objects by their attributes
2009cited by this paper
A unified architecture for natural language processing: deep neural networks with multitask learning
2008cited by this paper
Zero-data Learning of New Tasks
2008cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
A Structured Vector Space Model for Word Meaning in Context
2008cited by this paper
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
2007cited by this paper
Dependency-Based Construction of Semantic Space Models
2007cited by this paper
One-shot learning of object categories
2006cited by this paper
Cross-generalization: learning novel classes from a single example by feature replacement
2005cited by this paper
Geometric context from a single image
2005cited by this paper
From distributional to semantic similarity
2004cited by this paper
Object Classification from a Single Example Utilizing Class Relevance Metrics
2004cited by this paper
Object Classication from a Single Example Utilizing Class Relevance Pseudo-Metrics
2004cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
Automatic Word Sense Discrimination
1998cited by this paper
Automatic Retrieval and Clustering of Similar Words
1998cited by this paper
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
1997cited by this paper

CITED BY

Talking with Verifiers: Automatic Specification Generation for Neural Network Verification
2026cites this paper
A Multi-Modal Knowledge-Driven Approach for Generalized Zero-shot Video Classification
2026cites this paper
Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training
2026cites this paper
A Vision for Multisensory Intelligence: Sensing, Science, and Synergy
2026cites this paper
The First Mass Protest on Threads: Multimodal Mobilization and AI-Generated Visuals in Taiwan's Bluebird Movement
2026cites this paper
Generative Model-Based Mixed-Semantic Enhancement for Transductive Zero-Shot Learning
2026cites this paper
AugGen: a generative framework for continual generalized zero-shot learning
2026cites this paper
DDPTA: Zero-Shot Learning for Skeleton-Based Action Recognition
2026cites this paper
Sparse MoE as a New Treatment: Addressing Forgetting, Fitting, Learning Issues in Multi-Modal Multi-Task Learning
2025cites this paper
Multi-Granularity Mutual Refinement Network for Zero-Shot Learning
2025cites this paper
Hebrew Diacritics Restoration using Visual Representation
2025cites this paper
Efficient information extraction from resumes using small language models for SMEs based on Zero-Shot learning approach
2025cites this paper
LADB: Latent Aligned Diffusion Bridges for Semi-Supervised Domain Translation
2025cites this paper
mmZeAR: Zero-Effort Cross-Category Action Recognition With mmWave Radar
2025cites this paper
Attribute Prompt Alignment Network for Zero-Shot Learning
2025cites this paper
Zero-shot pipeline fault detection using percussion method and multi-attribute learning model
2025cites this paper
Underwater Sonar Image Classification with Image Disentanglement Reconstruction and Zero-Shot Learning
2025cites this paper
Alignclip: navigating the misalignments for robust vision-language generalization
2025cites this paper
Human activity recognition: A review of deep learning-based methods
2025cites this paper
Dual Prototype Contrastive Network for Generalized Zero-Shot Learning
2025cites this paper
Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being
2025cites this paper
Learn Concepts from Multi-Scale Visual Information for Compositional Zero-Shot Learning
2025cites this paper
Multihop Reconstruction for Generalized Zero-Shot Node Classification
2025cites this paper
ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making
2025cites this paper
A Review: One-Shot Object Detection Methods for Conditional Detection of Retail and Warehouse Products
2025cites this paper
Comparison of Depression Detection Between LLMs and Zero-Shot Learning Using DAD Dataset
2025influential citation
Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning
2025cites this paper
Rethinking Generalized Zero-Shot Learning: A Synthesized Per-Instance Attribute Perspective
2025cites this paper
Zero-Shot Component Inference of Unknown Gas Mixtures
2025cites this paper
SimZSL: Zero-Shot Learning Beyond a Pre-defined Semantic Embedding Space
2025cites this paper
DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition
2025cites this paper
Zero-Shot Image Classification by modeling Implicit Semantic Correlation Transferability
2025cites this paper
RSplitzero: generalized zero-shot learning in remote sensing across attribute splits with single and multi-modal representations
2025cites this paper
A Zero-Shot High-Dimensional Feature Fusion With STF-GAN for Cross-Domain Image Reconstruction
2025cites this paper
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
2025cites this paper
Causal Automated Machine Learning for Zero-Shot Decision-Making in Low-Resource Environments: A New Paradigm in Machine Learning Automation and Transferability
2025cites this paper
Deep ensemble learning method for zero-shot object detection
2025cites this paper
AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning
2025cites this paper
Zero-Shot Automatic Modulation Recognition Using a Large Vision-Language Model
2025cites this paper
On Calibration of Prompt Learning Using Temperature Scaling
2025cites this paper
Adaptive Fusion Learning for Compositional Zero-Shot Recognition
2025cites this paper
Radar Jamming Recognition: Models, Methods, and Prospects
2025cites this paper
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
2025cites this paper
Zero-Shot Parameter Learning of Robot Dynamics Using Bayesian Statistics and Prior Knowledge
2025cites this paper
Attribute-Informed and Similarity-Enhanced Zero-Shot Radar Target Recognition
2025cites this paper
A Conditional Probability Framework for Compositional Zero-shot Learning
2025cites this paper
Stress Detection using Multimodal Representation Learning, Fusion Techniques, and Applications
2025cites this paper
Bidirectional Semantic Consistency Guided Contrastive Embedding for Generative Zero-Shot Learning
2025cites this paper
Enhancing generalization in camera trap image recognition: Fine-tuning visual language models
2025cites this paper
FFusion: Feature Fusion Transformer for Zero-Shot Learning
2025cites this paper
MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification
2025cites this paper
Dynamic Relation Inference via Verb Embeddings
2025cites this paper
Learning to Identify Seen, Unseen and Unknown in the Open World: A Practical Setting for Zero-Shot Learning
2025cites this paper
Attention head purification: A new perspective to harness CLIP for domain generalization
2025cites this paper
Memory-Modular Classification: Learning to Generalize with Memory Replacement
2025cites this paper
Zero-Shot Performance Prediction for Probabilistic Scaling Laws
2025cites this paper
MetaZero: A Novel Meta-Learning Method Suitable for Generalized Zero-Shot Learning
2025cites this paper
Application of Contrastive Learning on ECG Data: Evaluating Performance in Japanese and Classification with Around 100 Labels
2025influential citation
Parameter-Efficient Continual Fine-Tuning: A Survey
2025cites this paper
ReKon3D: Relation-knowledge aware multi-modal embedding and contrastive GAN for zero-shot 3D recognition
2025cites this paper
ESE-GAN: Zero-Shot Food Image Classification Based on Low Dimensional Embedding of Visual Features
2025cites this paper
Pairwise Prompt-Based Tuning with Parameter Efficient Fast Adaptation for Generalized Zero-Shot Intent Detection
2025cites this paper
Federated hierarchical MARL for zero-shot cyber defense
2025cites this paper
Towards Context-sensitive Emotion Recognition
2025cites this paper
How to Talk to Your Classifier: Conditional Text Generation with Radar–Visual Latent Space
2025cites this paper
GVSE++: improved gated visual-semantic embedding for few-shot image and sentence matching
2025cites this paper
Enhancing Multimodal Tweet Analysis Accuracy through Integration of CLIP Model and Multi-layer Attention Mechanism
2024cites this paper
Joint Feature Generation and Open-set Prototype Learning for generalized zero-shot open-set classification
2024cites this paper
Zero-Shot Classification Using Hyperdimensional Computing
2024cites this paper
Llama-Time: Fine-Tuning a Large Language Model for Advanced Time Series Prediction
2024cites this paper
Low-rank Multi-modal Features Fusion Semantic Autoencoder For Zero-shot Learning
2024influential citation
CLIPCEIL: Domain Generalization through CLIP via Channel rEfinement and Image-text aLignment
2024cites this paper
Domain-Aware Prototype Network for Generalized Zero-Shot Learning
2024cites this paper
Synthesizing Knowledge-Enhanced Features for Real-World Zero-Shot Food Detection
2024cites this paper
An Active Transfer Learning framework for image classification based on Maximum Differentiation Classifier
2024cites this paper
VP-SFDA: Visual Prompt Source-Free Domain Adaptation for Cross-Modal Medical Image
2024cites this paper
Attention-driven frequency-based Zero-Shot Learning with phase augmentation
2024cites this paper
FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval
2024cites this paper
Text-guided Zero-Shot Object Localization
2024cites this paper
A weakly supervised method for 3D object detection with partially annotated samples
2024cites this paper
Do They Share the Same Tail? Learning Individual Compositional Attribute Prototype for Generalized Zero-Shot Learning
2024cites this paper
Syn2Real Domain Generalization for Underwater Mine-Like Object Detection Using Side-Scan Sonar
2024cites this paper
Integrating shape- and CNN-based features for zero-shot and low-shot learning
2024cites this paper
A Lightweight Framework With Knowledge Distillation for Zero-Shot Mars Scene Classification
2024cites this paper
A Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images
2024cites this paper
Adaptive Masking Enhances Visual Grounding
2024cites this paper
Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification
2024cites this paper
Visual–Semantic Graph Matching Net for Zero-Shot Learning
2024cites this paper
CLIPREC: Graph-Based Domain Adaptive Network for Zero-Shot Referring Expression Comprehension
2024cites this paper
Automating Chapter-Level Classification for Electronic Theses and Dissertations
2024cites this paper
Illumination-Aware Hallucination-Based Domain Adaptation for Thermal Pedestrian Detection
2024cites this paper
Attention Head Purification: A New Perspective to Harness CLIP for Domain Generalization
2024cites this paper
An Individual Identity-Driven Framework for Animal Re-Identification
2024cites this paper
RevCD - Reversed Conditional Diffusion for Generalized Zero-Shot Learning
2024cites this paper
Leveraging Large Language Models for Identifying Interpretable Linguistic Markers and Enhancing Alzheimer's Disease Diagnostics
2024cites this paper
Hierarchical novel class discovery for single-cell transcriptomic profiles
2024cites this paper
An autonomous drone swarm for detecting and tracking anomalies among dense vegetation
2024cites this paper
Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments
2024cites this paper
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
2024cites this paper
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
2024cites this paper