Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness

Valentin Barriere,Felipe del Rio,Andres Carvallo De Ferari,Carlos Aspillaga,Eugenio Herrera-Berg,Cristian Buc Calderon

Published 2023 in IEEE Games Entertainment Media Conference

ABSTRACT

Artificial neural networks typically struggle in generalizing to out-of-context examples. One reason for this limitation is caused by having datasets that incorporate only partial information regarding the potential correlational structure of the world. In this work, we propose TIDA (Targeted Image-editing Data Augmentation), a targeted data augmentation method focused on improving models’ human-like abilities (e.g., gender recognition) by filling the correlational structure gap using a text-to-image generative model. More specifically, TIDA identifies specific skills in captions describing images (e.g., the presence of a specific gender in the image), changes the caption (e.g., “woman” to “man”), and then uses a text-to-image model to edit the image in order to match the novel caption (e.g., uniquely changing a woman to a man while maintaining the context identical). Based on the Flickr30K benchmark, we show that, compared with the original data set, a TIDA-enhanced dataset related to gender, color, and counting abilities induces better performance in several image captioning metrics. Furthermore, on top of relying on the classical BLEU metric, we conduct a fine-grained analysis of the improvements of our models against the baseline in different ways. We compared text-to-image generative models and found different behaviors of the image captioning models in terms of encoding visual encoding and textual decoding.

PUBLICATION RECORD

Publication year
2023
Venue
IEEE Games Entertainment Media Conference
Publication date
2023-09-27
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2309.15991 arXiv 2309.15991
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Targeted Data Generation: Finding and Fixing Model Weaknesses
2023influential reference
Studying Generalization on Memory-Based Methods in Continual Learning
2023cited by this paper
Supplementary Materials for: NULL-text Inversion for Editing Real Images using Guided Diffusion Models
2023cited by this paper
Debiasing Vision-Language Models via Biased Prompts
2023cited by this paper
Hierarchical organization of language predictions in the brain
2023cited by this paper
GPT-4 Technical Report
2023influential reference
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
2023influential reference
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
2023cited by this paper
Synthetic Data from Diffusion Models Improves ImageNet Classification
2023influential reference
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
2022cited by this paper
Learning Multimodal Data Augmentation in Feature Space
2022cited by this paper
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment
2022cited by this paper
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation
2022cited by this paper
Dangerous Ground: One-Year-Old Infants are Sensitive to Peril in Other Agents’ Action Plans
2022cited by this paper
Opinions in Interactions : New Annotations of the SEMAINE Database
2022cited by this paper
AdaAug: Learning Class- and Instance-adaptive Data Augmentation Policies
2022cited by this paper
Prompt-to-Prompt Image Editing with Cross Attention Control
2022cited by this paper
Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning
2022cited by this paper
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
2022cited by this paper
Blended Latent Diffusion
2022cited by this paper
GIT: A Generative Image-to-text Transformer for Vision and Language
2022cited by this paper
Fixing Model Bugs with Natural Language Patches
2022cited by this paper
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
2022cited by this paper
Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks
2022cited by this paper
Hierarchical Text-Conditional Image Generation with CLIP Latents
2022cited by this paper
All in One: Exploring Unified Video-Language Pre-Training
2022cited by this paper
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022cited by this paper
Locating and Editing Factual Associations in GPT
2022influential reference
Unifying Vision-and-Language Tasks via Text Generation
2021cited by this paper
Knowledge Neurons in Pretrained Transformers
2021cited by this paper
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
2021cited by this paper
Text Augmentation in a Multi-Task View
2021cited by this paper
High-Resolution Image Synthesis with Latent Diffusion Models
2021influential reference
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
2021influential reference
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021influential reference
Transparent Human Evaluation for Image Captioning
2021cited by this paper
AEDA: An Easier Data Augmentation Technique for Text Classification
2021cited by this paper
Image Segmentation Using Text and Image Prompts
2021cited by this paper
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
2020cited by this paper
Contrastive Examples for Addressing the Tyranny of the Majority
2020cited by this paper
Explaining VQA predictions using visual grounding and a knowledge base
2020cited by this paper
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
2020cited by this paper
A Benchmark for Systematic Generalization in Grounded Language Understanding
2020cited by this paper
Faster AutoAugment: Learning Augmentation Strategies using Backpropagation
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
2019cited by this paper
Good-Enough Compositional Data Augmentation
2019cited by this paper
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
2019cited by this paper
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
2019cited by this paper
Image Captioning: Transforming Objects into Words
2019cited by this paper
A survey on Image Data Augmentation for Deep Learning
2019cited by this paper
AutoAugment: Learning Augmentation Strategies From Data
2019cited by this paper
Do NLP Models Know Numbers? Probing Numeracy in Embeddings
2019cited by this paper
Randaugment: Practical automated data augmentation with a reduced search space
2019cited by this paper
Meshed-Memory Transformer for Image Captioning
2019cited by this paper
What is my Dog Trying to Tell Me? the Automatic Recognition of the Context and Perceived Emotion of Dog Barks
2018cited by this paper
Building Machines that Learn and Think Like People
2018cited by this paper
Deep Learning: A Critical Appraisal
2018cited by this paper
A mathematical theory of semantic development in deep neural networks
2018cited by this paper
Women also Snowboard: Overcoming Bias in Captioning Models
2018cited by this paper
Decoupled Weight Decay Regularization
2017cited by this paper
Hybrid models for opinion analysis in speech interactions
2017cited by this paper
Learning to Generate Reviews and Discovering Sentiment
2017cited by this paper
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
2017cited by this paper
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
2017cited by this paper
Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence
2016cited by this paper
Self-Critical Sequence Training for Image Captioning
2016cited by this paper
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
2016cited by this paper
SPICE: Semantic Propositional Image Caption Evaluation
2016influential reference
Image Captioning with Semantic Attention
2016cited by this paper
Boosting Image Captioning with Attributes
2016cited by this paper
Deep Learning
2016cited by this paper
Mind's eye: A recurrent visual representation for image caption generation
2015cited by this paper
Understanding Neural Networks Through Deep Visualization
2015cited by this paper
Language Models for Image Captioning: The Quirks and What Works
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
Improving Neural Machine Translation Models with Monolingual Data
2015cited by this paper
CIDEr: Consensus-based image description evaluation
2014cited by this paper
From captions to visual concepts and back
2014cited by this paper
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
2014cited by this paper
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Gender classification from infants to seniors
2010cited by this paper
Evidence for counting in insects
2008cited by this paper
Core knowledge.
2007cited by this paper
Invariant visual representation by single neurons in the human brain
2005cited by this paper
The Oxford handbook of computational linguistics
2003cited by this paper
Recognition, discrimination and categorization of smiling by 5‐month‐old infants
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Large number discrimination in 6-month-old infants.
2000cited by this paper
Origins of knowledge.
1992cited by this paper
Counting behavior in animals: A critical evaluation.
1982cited by this paper

CITED BY

No citing papers are available for this paper.