A Survey on Learning Objects' Relationship for Image Captioning

Published 2023 in Computational Intelligence and Neuroscience

ABSTRACT

Image captioning is a challenging modality transformation task in computer vision and natural language processing, aiming to understand the image content and describe it with a natural language. Recently, the relationship information between objects in the image has been investigated to be of importance in generating a more vivid and readable sentence. Many types of research have been done in relationship mining and learning for leveraging into the caption models. This paper mainly summarizes the methods of relational representation and relational encoding in image captioning. Besides, we discuss the advantages and disadvantages of these methods and provide commonly used datasets for the relational captioning task. Finally, the current problems and challenges in this task are highlighted.

PUBLICATION RECORD

Publication year
2023
Venue
Computational Intelligence and Neuroscience
Publication date
2023-05-29
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1155/2023/8600853 PMID 37284051 PMCID 10241575
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Comprehending and Ordering Semantics for Image Captioning
2022cited by this paper
Bidirectional Projection Network for Cross Dimension Scene Understanding
2021cited by this paper
Dual-Level Collaborative Transformer for Image Captioning
2021cited by this paper
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
2021cited by this paper
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
2021cited by this paper
X-Linear Attention Networks for Image Captioning
2020cited by this paper
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
2020cited by this paper
Semantic Flow for Fast and Accurate Scene Parsing
2020cited by this paper
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
2020influential reference
Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning
2020cited by this paper
Stimulus-driven and concept-driven analysis for image caption generation
2020cited by this paper
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
2020cited by this paper
A long video caption generation algorithm for big video data retrieval
2019cited by this paper
Towards Personalized Image Captioning via Multimodal Memory Networks
2019cited by this paper
Entangled Transformer for Image Captioning
2019cited by this paper
Unified Vision-Language Pre-Training for Image Captioning and VQA
2019cited by this paper
Hierarchy Parsing for Image Captioning
2019cited by this paper
Reflective Decoding Network for Image Captioning
2019cited by this paper
Look Back and Predict Forward in Image Captioning
2019cited by this paper
Attention on Attention for Image Captioning
2019cited by this paper
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
2019cited by this paper
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
2019cited by this paper
Image Captioning: Transforming Objects into Words
2019cited by this paper
Meshed-Memory Transformer for Image Captioning
2019cited by this paper
Multimodal Transformer With Multi-View Visual Representation for Image Captioning
2019cited by this paper
VideoBERT: A Joint Model for Video and Language Representation Learning
2019cited by this paper
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
2018cited by this paper
Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present
2018influential reference
End-to-End Convolutional Semantic Embeddings
2018cited by this paper
Exploring Visual Relationship for Image Captioning
2018influential reference
Context-Aware Visual Policy Network for Sequence-Level Image Captioning
2018cited by this paper
Auto-Encoding Scene Graphs for Image Captioning
2018influential reference
SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text
2018cited by this paper
Recurrent Fusion Network for Image Captioning
2018cited by this paper
Captioning Transformer with Stacked Attention Modules
2018cited by this paper
Neural Baby Talk
2018cited by this paper
Visual Translation Embedding Network for Visual Relation Detection
2017cited by this paper
Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner
2017cited by this paper
Attention is All you Need
2017influential reference
Convolutional Image Captioning
2017cited by this paper
Deep Reinforcement Learning-Based Image Captioning with Embedding Reward
2017cited by this paper
Paying Attention to Descriptions Generated by Image Captioning Models
2017cited by this paper
Hierarchy of Information Processing in the Brain: A Novel 'Intrinsic Ignition' Framework.
2017cited by this paper
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
2016cited by this paper
Boosting Image Captioning with Attributes
2016cited by this paper
Image Captioning with Semantic Attention
2016cited by this paper
Improved Image Captioning via Policy Gradient optimization of SPIDEr
2016cited by this paper
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
2016cited by this paper
Layer Normalization
2016cited by this paper
Visual Relationship Detection with Language Priors
2016cited by this paper
SPICE: Semantic Propositional Image Caption Evaluation
2016influential reference
Areas of Attention for Image Captioning
2016cited by this paper
Language Modeling with Gated Convolutional Networks
2016cited by this paper
Self-Critical Sequence Training for Image Captioning
2016cited by this paper
Microsoft COCO Captions: Data Collection and Evaluation Server
2015influential reference
Generating Multi-Sentence Lingual Descriptions of Indoor Scenes
2015cited by this paper
Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
2015cited by this paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Fast R-CNN
2015cited by this paper
Neural Module Networks
2015cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
CIDEr: Consensus-based image description evaluation
2014influential reference
Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections
2014cited by this paper
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
2014cited by this paper
BabyTalk: Understanding and Generating Simple Image Descriptions
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Composing Simple Image Descriptions using Web-scale N-grams
2011cited by this paper
Im2Text: Describing Images Using 1 Million Captioned Photographs
2011cited by this paper
Corpus-Guided Sentence Generation of Natural Images
2011cited by this paper
Every Picture Tells a Story: Generating Sentences from Images
2010cited by this paper
I2T: Image Parsing to Text Description
2010cited by this paper
Generating Image Descriptions Using Dependency Relational Patterns
2010cited by this paper
Collecting Image Annotations Using Amazon’s Mechanical Turk
2010cited by this paper
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
2005influential reference
ROUGE: A Package for Automatic Evaluation of Summaries
2004influential reference
Accurate Unlexicalized Parsing
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002influential reference
Long Short-Term Memory
1997cited by this paper
Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences
1996cited by this paper

CITED BY

Image-to-Text Description Approach based on Deep Learning Models
2024cites this paper
TransEffiVisNet – an image captioning architecture for auditory assistance for the visually impaired
2024cites this paper
Spatial guided image captioning: Guiding attention with object's spatial interaction
2024cites this paper