Fine-Grained Visual-Textual Representation Learning

Published 2017 in IEEE transactions on circuits and systems for video technology (Print)

ABSTRACT

Fine-grained visual categorization is to recognize hundreds of subcategories belonging to the same basic-level category, which is a highly challenging task due to the quite subtle and local visual distinctions among similar subcategories. Most existing methods generally learn part detectors to discover discriminative regions for better categorization performance. However, not all parts are beneficial and indispensable for visual categorization, and the setting of part detector number heavily relies on prior knowledge as well as experimental validation. As is known to all, when we describe the object of an image via textual descriptions, we mainly focus on the pivotal characteristics and rarely pay attention to common characteristics as well as the background areas. This is an involuntary transfer from human visual attention to textual attention, which leads to the fact that textual attention tells us how many and which parts are discriminative and significant to categorization. So, textual attention could help us to discover visual attention in the image. Inspired by this, we propose a fine-grained visual-textual representation learning (VTRL) approach, and its main contributions are: 1) fine-grained visual-textual pattern mining devotes to discovering discriminative visual-textual pairwise information for boosting categorization performance through jointly modeling vision and text with generative adversarial networks, which automatically and adaptively discovers discriminative parts and 2) VTRL jointly combines visual and textual information, which preserves the intra-modality and inter-modality information to generate complementary fine-grained representation, as well as further improves categorization performance. Comprehensive experimental results on the widely used CUB-200-2011 and Oxford Flowers-102 datasets demonstrate the effectiveness of our VTRL approach, which achieves the best categorization accuracy compared with the state-of-the-art methods.

PUBLICATION RECORD

Publication year
2017
Venue
IEEE transactions on circuits and systems for video technology (Print)
Publication date
2017-08-31
Fields of study
Computer Science
Identifiers
DOI 10.1109/TCSVT.2019.2892802 arXiv 1709.00340
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation
2018cited by this paper
AutoBD: Automated Bi-Level Description for Scalable Fine-Grained Visual Categorization
2018cited by this paper
Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification
2017cited by this paper
Fine-Grained Image Classification via Combining Vision and Language
2017cited by this paper
Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud
2017cited by this paper
PBC: Polygon-Based Classifier for Fine-Grained Categorization
2017cited by this paper
Picking Neural Activations for Fine-Grained Recognition
2017cited by this paper
Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition
2017cited by this paper
Friend or Foe: Fine-Grained Categorization With Weak Supervision
2017cited by this paper
Towards Reversal-Invariant Image Representation
2017influential reference
One-Shot Fine-Grained Instance Retrieval
2017cited by this paper
SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition
2016cited by this paper
Task-Driven Progressive Part Localization for Fine-Grained Object Recognition
2016cited by this paper
On Branded Handbag Recognition
2016cited by this paper
Fused One-vs-All Features With Semantic Alignments for Fine-Grained Visual Categorization
2016cited by this paper
Event Specific Multimodal Pattern Mining for Knowledge Base Construction
2016cited by this paper
Coarse-to-Fine Description for Fine-Grained Visual Categorization
2016cited by this paper
Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation
2016cited by this paper
Generative Adversarial Text to Image Synthesis
2016cited by this paper
Picking Deep Filter Responses for Fine-Grained Image Recognition
2016cited by this paper
Hierarchical Question-Image Co-Attention for Visual Question Answering
2016cited by this paper
Low-Rank Bilinear Pooling for Fine-Grained Classification
2016cited by this paper
Selective Search for Object Recognition
2016cited by this paper
Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization
2016cited by this paper
Character-level Convolutional Networks for Text Classification
2015cited by this paper
Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop
2015cited by this paper
Bilinear CNN Models for Fine-Grained Visual Recognition
2015cited by this paper
Part-Stacked CNN for Fine-Grained Visual Categorization
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Fine-Grained Image Classification by Exploring Bipartite-Graph Labels
2015cited by this paper
Spatial Transformer Networks
2015influential reference
Associating neural word embeddings with deep image representations using Fisher Vectors
2015cited by this paper
Learning Deep Features for Discriminative Localization
2015influential reference
Fine-grained recognition without part annotations
2015cited by this paper
Multiple Granularity Descriptors for Fine-Grained Categorization
2015cited by this paper
Deep LAC: Deep localization, alignment and classification for fine-grained recognition
2015cited by this paper
DeepBag: Recognizing Handbag Models
2015cited by this paper
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
2015cited by this paper
Fine-grained visual categorization via multi-stage metric learning
2014influential reference
Mid-level deep pattern mining
2014cited by this paper
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Part-Based R-CNNs for Fine-Grained Category Detection
2014cited by this paper
Generalized Max Pooling
2014cited by this paper
Fully convolutional networks for semantic segmentation
2014cited by this paper
From generic to specific deep representations for visual recognition
2014cited by this paper
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
The application of two-level attention models in deep convolutional neural network for fine-grained image classification
2014influential reference
Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Evaluation of output embeddings for fine-grained image classification
2014cited by this paper
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
2013cited by this paper
Fine-Grained Visual Classification of Aircraft
2013cited by this paper
Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
2013cited by this paper
Describing Textures in the Wild
2013cited by this paper
Hierarchical Part Matching for Fine-Grained Visual Categorization
2013cited by this paper
Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
2013cited by this paper
Efficient Object Detection and Segmentation for Fine-Grained Recognition
2013cited by this paper
POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation
2013cited by this paper
3D Object Representations for Fine-Grained Categorization
2013cited by this paper
Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs
2012cited by this paper
Unsupervised Template Learning for Fine-Grained Object Recognition
2012cited by this paper
Beyond search
2011cited by this paper
Multimodal Deep Learning
2011cited by this paper
The Caltech-UCSD Birds-200-2011 Dataset
2011influential reference
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Automated Flower Classification over a Large Number of Classes
2008cited by this paper
Frequent pattern mining: current status and future directions
2007cited by this paper
Audio-Visual Speech Synchrony Measure for Talking-Face Identity Verification
2007cited by this paper
Canonical Correlation Analysis: An Overview with Application to Learning Methods
2004cited by this paper
Mining frequent patterns without candidate generation
2000cited by this paper
Fast Algorithms for Mining Association Rules
1998influential reference
Long Short-Term Memory
1997cited by this paper
Mining association rules between sets of items in large databases
1993cited by this paper
Cognitive Psychology and Its Implications
1980cited by this paper
A Threshold Selection Method from Gray-Level Histograms
1979cited by this paper
Relations Between Two Sets of Variates
1936cited by this paper

CITED BY

ICQ-TransE: LLM-Enhanced Image-Caption-Question Translating Embeddings for Knowledge-Based Visual Question Answering
2026cites this paper
Mixture of Concept Bottleneck Experts
2026influential citation
Robust Hyperspherical Classification with Dynamic Assignment Based on RepViT
2025cites this paper
Image generation method based on improved diffusion models
2025cites this paper
A cross modal fine-grained retrieval method based on LAGC and contrastive learning
2025cites this paper
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting
2025cites this paper
Automated Essential Concept Discovery for Few-Shot Out-of-Distribution Detection
2025cites this paper
SFAN: Selective Filter and Alignment Network for Cross-Modal Retrieval
2025cites this paper
ME: Trigger Element Combination Backdoor Attack on Copyright Infringement
2025influential citation
Quantized Convolutional Neural Networks Robustness under Perturbation
2025cites this paper
Causally Reliable Concept Bottleneck Models
2025cites this paper
Adaptive Multi-Resolution Feature Fusion for Fine-Grained Visual Classification
2025cites this paper
Introducing Visual Perception Token into Multimodal Large Language Model
2025cites this paper
Learning Multi-Scale Knowledge-Guided Features for Text-Guided Face Recognition
2025cites this paper
Channel Code-Book (CCB): Semantic Image-Adaptive Transmission in Satellite–Ground Scenario
2025cites this paper
Towards more holistic interpretability: A lightweight disentangled Concept Bottleneck Model
2025cites this paper
DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects
2024cites this paper
The Image Data and Backbone in Weakly Supervised Fine-Grained Visual Categorization: A Revisit and Further Thinking
2024cites this paper
Transforming gradient-based techniques into interpretable methods
2024cites this paper
Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval
2024cites this paper
Trainable Fractional Fourier Transform
2024cites this paper
Learning to disentangle and fuse for fine-grained multi-modality ship image retrieval
2024cites this paper
Efficient Few-Shot Classification Using Self-Supervised Learning and Class Factor Analysis
2024cites this paper
Active Contrastive Learning With Noisy Labels in Fine-Grained Classification
2024cites this paper
Incremental Residual Concept Bottleneck Models
2024influential citation
Understanding Multimodal Deep Neural Networks: A Concept Selection View
2024cites this paper
Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality
2024cites this paper
Semi-supervised Concept Bottleneck Models
2024cites this paper
MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition
2024cites this paper
Maximally Selective Fractional Fourier Pooling
2024cites this paper
CLIP Adaptation by Intra-modal Overlap Reduction
2024cites this paper
A Survey of Transformer-Based Few-Shot Image Classification Techniques
2024cites this paper
A cross-granularity feature fusion method for fine-grained image recognition
2024cites this paper
A Concept-Centric Approach to Multi-Modality Learning
2024influential citation
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
2024cites this paper
Debiased Video-Text Retrieval via Soft Positive Sample Calibration
2023cites this paper
Enhancing Instance-Level Image Classification with Set-Level Labels
2023cites this paper
An Interpretable Fusion Siamese Network for Multi-Modality Remote Sensing Ship Image Retrieval
2023cites this paper
Cross-modal Semantically Augmented Network for Image-text Matching
2023cites this paper
OMG-Attack: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks
2023cites this paper
Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition
2023cites this paper
Unsupervised discovery of Interpretable Visual Concepts
2023cites this paper
Multi-modal Fine-grained Retrieval with Local and Global Cross-Attention
2023cites this paper
Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework
2023cites this paper
S3Mix: Same Category Same Semantics Mixing for Augmenting Fine-grained Images
2023cites this paper
Taxonomy-Structured Domain Adaptation
2023influential citation
Few-shot Classification Network Based on Feature Differentiation
2023cites this paper
Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
2023influential citation
Global Information-Assisted Fine-Grained Visual Categorization in Internet of Things
2023cites this paper
DialogPaint: A Dialog-based Image Editing Model
2023cites this paper
A Channel Mix Method for Fine-Grained Cross-Modal Retrieval
2022cites this paper
A Plant Disease Recognition Method Based on Fusion of Images and Graph Structure Text
2022cites this paper
A survey on knowledge-enhanced multimodal learning
2022cites this paper
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
2022cites this paper
Few-shot vegetable disease recognition model based on image text collaborative representation learning
2021cites this paper
Learning Hierarchal Channel Attention for Fine-grained Visual Classification
2021influential citation
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
2021influential citation
Multiresolution Discriminative Mixup Network for Fine-Grained Visual Categorization
2021cites this paper
On the Imaginary Wings: Text-Assisted Complex-Valued Fusion Network for Fine-Grained Visual Classification
2021cites this paper
Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning
2021cites this paper
SAL-Net: Self-Supervised Attribute Learning for Object Recognition and Segmentation
2021cites this paper
3D Face Anti-Spoofing With Factorized Bilinear Coding
2020cites this paper
Text-Embedded Bilinear Model for Fine-Grained Visual Recognition
2020cites this paper
Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
2020cites this paper
Dependency Decomposition and a Reject Option for Explainable Models
2020cites this paper
TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples
2020cites this paper
A non-Lambertian photometric stereo under perspective projection
2019cites this paper
Fine-grained Cross-media Representation Learning with Deep Quantization Attention Network
2019cites this paper
Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens
year unknowncites this paper