Generating captions without looking beyond objects

Hendrik Heuer,Christof Monz,A. Smeulders

Published 2016 in arXiv.org

ABSTRACT

This paper explores new evaluation perspectives for image captioning and introduces a noun translation task that achieves comparative image caption generation performance by translating from a set of nouns to captions. This implies that in image captioning, all word categories other than nouns can be evoked by a powerful language model without sacrificing performance on n-gram precision. The paper also investigates lower and upper bounds of how much individual word categories in the captions contribute to the final BLEU score. A large possible improvement exists for nouns, verbs, and prepositions.

PUBLICATION RECORD

Publication year
2016
Venue
arXiv.org
Publication date
2016-10-01
Fields of study
Computer Science
Identifiers
arXiv 1610.03708
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Trainable performance upper bounds for image and video captioning
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Microsoft COCO Captions: Data Collection and Evaluation Server
2015cited by this paper
Image Captioning with an Intermediate Attributes Layer
2015cited by this paper
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger
2000cited by this paper

CITED BY

Automated Code Comments Generation Using Large Language Models: Empirical Evaluation of T5 and BART
2025cites this paper
Towards Real World Debiasing: A Fine-grained Analysis On Spurious Correlation
2024cites this paper
Towards Real-world Debiasing: Rethinking Evaluation, Challenge, and Solution
2024cites this paper
On the Foundations of Shortcut Learning
2023cites this paper
Clever Hans effect found in a widely used brain tumour MRI dataset
2022cites this paper
Towards Scalable Physically Consistent Neural Networks: an Application to Data-driven Multi-zone Thermal Building Models
2022cites this paper
Current limitations to identify covid-19 using artificial intelligence with chest x-ray imaging (part ii). The shortcut learning problem
2021cites this paper
Image Captioning through Cognitive IOT and Machine-Learning Approaches
2021cites this paper
Shortcut learning in deep neural networks
2020cites this paper
Linguistic Variation and Anomalies in Comparisons of Human and Machine-Generated Image Captions
2020influential citation
Users & Machine Learning-based Curation Systems
2020cites this paper
ON THE EFFECTIVENESS OF TASK GRANULARITY FOR TRANSFER LEARNING
2018cites this paper
Fine-grained Video Classification and Captioning
2018cites this paper
Defoiling Foiled Image Captions
2018cites this paper
Middle-Out Decoding
2018cites this paper
The “Something Something” Video Database for Learning and Evaluating Visual Common Sense
2017cites this paper
Data-driven image captioning via salient region discovery
2017cites this paper
Smith ScholarWorks Smith ScholarWorks
year unknowninfluential citation