In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.
Data-driven image captioning via salient region discovery
Mert Kilickaya,Burak Kerim Akkus,Ruken Cakici,Aykut Erdem,Erkut Erdem,Nazli Ikizler-Cinbis
Published 2017 in IET Computer Vision
ABSTRACT
PUBLICATION RECORD
- Publication year
2017
- Venue
IET Computer Vision
- Publication date
2017-03-03
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-49 of 49 references · Page 1 of 1
CITED BY
Showing 1-12 of 12 citing papers · Page 1 of 1