Unpaired Image Captioning via Scene Graph Alignments

Jiuxiang Gu,Shafiq R. Joty,Jianfei Cai,Handong Zhao,Xu Yang,G. Wang

Published 2019 in IEEE International Conference on Computer Vision

ABSTRACT

Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.

PUBLICATION RECORD

Publication year
2019
Venue
IEEE International Conference on Computer Vision
Publication date
2019-03-26
Fields of study
Computer Science
Identifiers
DOI 10.1109/ICCV.2019.01042 arXiv 1903.10658
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Generative Adversarial Networks
2021cited by this paper
Material for “ Auto-Encoding Scene Graphs for Image Captioning ”
2019influential reference
Scene Graph Generation With External Knowledge and Image Reconstruction
2019cited by this paper
Boundary-Aware Feature Propagation for Scene Segmentation
2019cited by this paper
Semantic Correlation Promoted Shape-Variant Context for Segmentation
2019cited by this paper
Learning to Collocate Neural Modules for Image Captioning
2019cited by this paper
Semantically Guided Visual Question Answering
2018cited by this paper
Unpaired Image Captioning by Language Pivoting
2018influential reference
Unsupervised Image Captioning
2018influential reference
Engaging Image Captioning via Personality
2018cited by this paper
Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation
2018cited by this paper
Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features
2018cited by this paper
Unsupervised Machine Translation Using Monolingual Corpora Only
2017cited by this paper
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
2017cited by this paper
Unsupervised Neural Machine Translation
2017cited by this paper
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
2017cited by this paper
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
2017influential reference
Neural Motifs: Scene Graph Parsing with Global Context
2017cited by this paper
Multi-View Clustering via Deep Matrix Factorization
2017cited by this paper
Improved Training of Wasserstein GANs
2017cited by this paper
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
2016cited by this paper
Multimodal Pivots for Image Caption Translation
2016cited by this paper
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
2016cited by this paper
Image Captioning with Semantic Attention
2016cited by this paper
SPICE: Semantic Propositional Image Caption Evaluation
2016cited by this paper
An Empirical Study of Language CNN for Image Captioning
2016cited by this paper
Graph-Structured Representations for Visual Question Answering
2016cited by this paper
Self-Critical Sequence Training for Image Captioning
2016cited by this paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2015cited by this paper
Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
2015cited by this paper
Empirical Evaluation of Rectified Activations in Convolutional Network
2015influential reference
Recent advances in convolutional neural networks
2015influential reference
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
From captions to visual concepts and back
2014cited by this paper
Microsoft COCO: Common Objects in Context
2014cited by this paper
CIDEr: Consensus-based image description evaluation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010influential reference
Visualizing Data using t-SNE
2008cited by this paper
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
2005cited by this paper
Accurate Unlexicalized Parsing
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
In the English-speaking world
1999cited by this paper

CITED BY

VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning
2026cites this paper
Multi-scale scene graph generation for remote sensing imagery
2026cites this paper
OpenSGen: Fine-Grained Relation-Aware Prompt for Open-Vocabulary Scene Graph Generation
2025cites this paper
Hierarchical Scene Graph Generation and Vectorization of Aerial Images
2025cites this paper
Universal Scene Graph Generation via Semantic Feature Alignment
2025cites this paper
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
2025cites this paper
Human-Inspired Scene Understanding: A Grounded Cognition Method for Unbiased Scene Graph Generation
2025cites this paper
Joint attention GAN for medical report generation with clinical style preservation
2025cites this paper
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
2025cites this paper
Data Transformation Strategies to Remove Heterogeneity
2025cites this paper
Synthesize then align: Modality alignment augmentation for zero-shot image captioning with synthetic data
2025cites this paper
Hi-MetaCap: Configuring Object Relational Transformer in Meta-Learning Environment for Image Captioning in Hindi
2025cites this paper
Front-door causal attention for unbiased panoptic scene graph generation
2025cites this paper
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
2025cites this paper
Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data
2025cites this paper
UTStyleCap4K: Generating Image Captions with Sentimental Styles
2025cites this paper
Dual-Aspect Noise-Based Regularization for Multi-Modal Relation Extraction in Media Posts
2025cites this paper
A Holistic Review of Image-to-Text Conversion: Techniques, Evaluation Metrics, Multilingual Captioning, Storytelling and Integration
2025cites this paper
From coarse to fine: a two-stage common semantic space construction for unpaired cross modal retrieval
2025cites this paper
A Causal Adjustment Module for Debiasing Scene Graph Generation
2025cites this paper
Learning with semantic ambiguity for unbiased scene graph generation
2025cites this paper
Unbinding tensor product representations for image captioning with semantic alignment and complementation
2024cites this paper
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
2024cites this paper
MedCycle: Unpaired Medical Report Generation via Cycle-Consistency
2024cites this paper
Adaptive Feature Learning for Unbiased Scene Graph Generation
2024cites this paper
Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential
2024cites this paper
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends, and Metrics Analysis
2024cites this paper
G-DPPD: Gated Data-dependent Prior Probability Distribution for Unsupervised Image Captioning
2024cites this paper
CVLP-NaVD: Contrastive Visual-language Pre-training Models for Non-annotated Visual Description
2024influential citation
Hierarchical Prompt Learning for Scene Graph Generation
2024cites this paper
Semi-supervised Chinese poem-to-painting generation via cycle-consistent adversarial networks
2024cites this paper
A Survey on Automatic Image Captioning Approaches: Contemporary Trends and Future Perspectives
2024cites this paper
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
2024cites this paper
Dynamic Scene Graph Generation with Unified Temporal Modeling
2024cites this paper
Causal Intervention for Panoptic Scene Graph Generation
2024cites this paper
Addressing Predicate Overlap in Scene Graph Generation with Semantics-prototype Learning
2024cites this paper
Pixels to Prose: Understanding the art of Image Captioning
2024cites this paper
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy and Novel Ensemble Method
2024cites this paper
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
2024cites this paper
Unpaired Image-Text Matching via Multimodal Aligned Conceptual Knowledge
2024cites this paper
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
2024cites this paper
MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
2024cites this paper
Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation
2024cites this paper
A Parallel Transformer Framework for Video Moment Retrieval
2024cites this paper
Pseudo Content Hallucination for Unpaired Image Captioning
2024influential citation
Multi-Label Action Anticipation for Real-World Videos With Scene Understanding
2024cites this paper
Adversarial Attacks on Scene Graph Generation
2024cites this paper
Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning
2024cites this paper
A New Training Data Organization Form and Training Mode for Unbiased Scene Graph Generation
2024cites this paper
Towards Bridged Vision and Language: Learning Cross-Modal Knowledge Representation for Relation Extraction
2024cites this paper
Deep image captioning: A review of methods, trends and future challenges
2023cites this paper
Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling
2023cites this paper
Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning
2023cites this paper
Visual-linguistic-stylistic Triple Reward for Cross-lingual Image Captioning
2023cites this paper
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
2023cites this paper
Iterative Learning with Extra and Inner Knowledge for Long-tail Dynamic Scene Graph Generation
2023cites this paper
Predicate Classification Using Optimal Transport Loss in Scene Graph Generation
2023cites this paper
CgT-GAN: CLIP-guided Text GAN for Image Captioning
2023influential citation
Self-Supervised Multimodal Learning: A Survey
2023influential citation
Boosting Scene Graph Generation with Contextual Information
2023cites this paper
Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation
2023cites this paper
Location-Free Scene Graph Generation
2023cites this paper
Improving Scene Graph Generation with Superpixel-Based Interaction Learning
2023cites this paper
Augmented Spatial Context Fusion Network for Scene Graph Generation
2023cites this paper
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
2023cites this paper
Image Alone Are Not Enough: A General Semantic-Augmented Transformer-Based Framework for Image Captioning
2023cites this paper
A Novel End-to-End Transformer for Scene Graph Generation
2023cites this paper
Text-based Person Search without Parallel Image-Text Data
2023cites this paper
Importance First: Generating Scene Graph of Human Interest
2023cites this paper
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
2023cites this paper
Unbiased Scene Graph Generation via Two-Stage Causal Modeling
2023cites this paper
LANDMARK: language-guided representation enhancement framework for scene graph generation
2023cites this paper
VirtualHome Action Genome: A Simulated Spatio-Temporal Scene Graph Dataset with Consistent Relationship Labels
2023cites this paper
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
2023influential citation
Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
2023cites this paper
Image to Text: Comprehensive Review on Deep Learning Based Unsupervised Image Captioning
2023cites this paper
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
2023cites this paper
Zero-shot Scene Graph Generation via Triplet Calibration and Reduction
2023cites this paper
Context-aware Mixture-of-Experts for Unbiased Scene Graph Generation
2022cites this paper
Multimodal research in vision and language: A review of current and emerging trends
2022cites this paper
Atom correlation based graph propagation for scene graph generation
2022cites this paper
Semantically Similarity-Wise Dual-Branch Network for Scene Graph Generation
2022cites this paper
Cognitive Explainers of Graph Neural Networks Based on Medical Concepts
2022cites this paper
RelTR: Relation Transformer for Scene Graph Generation
2022cites this paper
Deep Learning Approaches on Image Captioning: A Review
2022influential citation
Unpaired Image Captioning by Image-Level Weakly-Supervised Visual Concept Recognition
2022influential citation
Spatial Commonsense Graph for Object Localisation in Partial Scenes
2022cites this paper
Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation
2022cites this paper
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation
2022cites this paper
Fine-Grained Scene Graph Generation with Data Transfer
2022cites this paper
Learning cross-modality features for image caption generation
2022cites this paper
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
2022cites this paper
RU-Net: Regularized Unrolling Network for Scene Graph Generation
2022cites this paper
Prompt-Based Learning for Unpaired Image Captioning
2022cites this paper
Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning
2022cites this paper
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
2022cites this paper
Dynamic Scene Graph Generation via Anticipatory Pre-training
2022cites this paper
PPDL: Predicate Probability Distribution based Loss for Unbiased Scene Graph Generation
2022cites this paper
The Topology and Language of Relationships in the Visual Genome Dataset
2022cites this paper
Iterative Scene Graph Generation
2022cites this paper