Data-driven image captioning via salient region discovery

Mert Kilickaya,Burak Kerim Akkus,Ruken Cakici,Aykut Erdem,Erkut Erdem,Nazli Ikizler-Cinbis

Published 2017 in IET Computer Vision

ABSTRACT

In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.

PUBLICATION RECORD

Publication year
2017
Venue
IET Computer Vision
Publication date
2017-03-03
Fields of study
Computer Science
Identifiers
DOI 10.1049/iet-cvi.2016.0286
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
2016cited by this paper
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
2016cited by this paper
Generating captions without looking beyond objects
2016cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Improving object proposals with multi-thresholding straddling expansion
2015cited by this paper
Large Scale Retrieval and Generation of Image Descriptions
2015influential reference
A Distributed Representation Based Query Expansion Approach for Image Captioning
2015cited by this paper
Exploring Nearest Neighbor Approaches for Image Captioning
2015cited by this paper
Salient Object Detection: A Benchmark
2015cited by this paper
Microsoft COCO Captions: Data Collection and Evaluation Server
2015cited by this paper
Mind's eye: A recurrent visual representation for image caption generation
2015cited by this paper
Co-localization in Real-World Images
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Object Co-detection via Efficient Inference in a Fully-Connected CRF
2014cited by this paper
Learning Deep Features for Scene Recognition using Places Database
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
2014cited by this paper
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
2014cited by this paper
Return of the Devil in the Details: Delving Deep into Convolutional Nets
2014cited by this paper
CIDEr: Consensus-based image description evaluation
2014cited by this paper
Nonparametric Method for Data-driven Image Captioning
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding
2014influential reference
What Makes a Patch Distinct?
2013cited by this paper
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics
2013influential reference
Parsing with Compositional Vector Grammars
2013cited by this paper
Studying Relationships between Human Gaze, Description, and Computer Vision
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Choosing Linguistics over Vision to Describe Images
2012cited by this paper
Collective Generation of Natural Image Descriptions
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Understanding and predicting importance in images
2012cited by this paper
Im2Text: Describing Images Using 1 Million Captioned Photographs
2011cited by this paper
Measuring the Objectness of Image Windows
2011cited by this paper
Composing Simple Image Descriptions using Web-scale N-grams
2011cited by this paper
Locality-constrained Linear Coding for image classification
2010cited by this paper
SimpleNLG: A Realisation Engine for Practical Applications
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
VisualRank: Applying PageRank to Large-Scale Image Search
2008cited by this paper
Moses: Open Source Toolkit for Statistical Machine Translation
2007cited by this paper
Dominant Sets and Pairwise Clustering
2007cited by this paper
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
2005cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004cited by this paper
Shallow Parsing with Conditional Random Fields
2003cited by this paper
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001cited by this paper
Introduction to the CoNLL-2000 Shared Task Chunking
2000cited by this paper
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
1998cited by this paper

CITED BY

SINTESIS TEKS KE GAMBAR: TINJAUAN ATAS DATASET
2024cites this paper
An Analysis of Image Captioning Models using Deep Learning
2023cites this paper
aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption
2020cites this paper
Video captioning using boosted and parallel Long Short-Term Memory networks
2020cites this paper
Gaussian Smoothen Semantic Features (GSSF) - Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
2020cites this paper
MRRC: multiple role representation crossover interpretation for image captioning with R-CNN feature distribution composition (FDC)
2020cites this paper
AACR: Feature Fusion Effects of Algebraic Amalgamation Composed Representation on (De)Compositional Network for Caption Generation for Images
2020cites this paper
Survey of deep learning and architectures for visual captioning—transitioning between media and natural languages
2019cites this paper
CRUR: coupled-recurrent unit for unification, conceptualization and context capture for language representation - a generalization of bi directional LSTM
2019cites this paper
A Systematic Literature Review on Image Captioning
2019influential citation
Fractional amplitude of low-frequency fluctuation and degree centrality in autistic children: a resting-state fMRI study
2018cites this paper
Perceptual quality evaluation for motion deblurring
2018cites this paper