Generative Adversarial Text to Image Synthesis

Scott E. Reed,Zeynep Akata,Xinchen Yan,Lajanugen Logeswaran,B. Schiele,Honglak Lee

Published 2016 in International Conference on Machine Learning

ABSTRACT

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Machine Learning
Publication date
2016-05-17
Fields of study
Computer Science
Identifiers
arXiv 1605.05396
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

The model generates plausible images of birds and flowers from detailed text descriptions.
Confidence 0.96

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
A deep architecture and GAN formulation are introduced to bridge recurrent neural network architectures, discriminative text feature representations, and deep convolutional generative adversarial networks for text-to-image synthesis.
Confidence 0.97

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review

CONCEPTS

birds and flowers
categories

Two natural image categories used as the demonstration targets.

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
deep architecture and gan formulation
methods

The combined model design that links text encodings to adversarial image generation.

Aliases: novel deep architecture and GAN formulation

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
deep convolutional generative adversarial networks
methods

Convolutional adversarial image generators used as the visual synthesis branch.

Aliases: GANs

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
detailed text descriptions
inputs

Text inputs with enough detail to specify the desired visual content.

Aliases: text descriptions

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
discriminative text feature representations
representations

Text embeddings that capture descriptive content for conditioning image generation.

Aliases: text feature representations

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
recurrent neural network architectures
methods

Sequence models used to learn discriminative text representations from descriptions.

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review
text-to-image synthesis
tasks

The task of generating images from text descriptions.

Aliases: text to image synthesis

뀨 (7c402c1b98) extractionAll you need is Python (5d7gwfm5zu) reviewq (76h6bfydm6) review

REFERENCES

GENERATIVE ADVERSARIAL NETS
2018cited by this paper
Learning Deep Representations of Fine-Grained Visual Descriptions
2016influential reference
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
2015influential reference
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
2015cited by this paper
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
2015cited by this paper
Deep Visual Analogy-Making
2015cited by this paper
DRAW: A Recurrent Neural Network For Image Generation
2015influential reference
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
2015cited by this paper
Generating Images from Captions with Attention
2015cited by this paper
Exploring Models and Data for Image Question Answering
2015influential reference
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Explicit Knowledge-based Reasoning for Visual Question Answering
2015cited by this paper
Attribute2Image: Conditional Image Generation from Visual Attributes
2015cited by this paper
Conditional generative adversarial nets for convolutional face generation
2015cited by this paper
Going deeper with convolutions
2014influential reference
Learning to Disentangle Factors of Variation with Manifold Interaction
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation
2014cited by this paper
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
2014cited by this paper
Long-term recurrent convolutional networks for visual recognition and description
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
2014cited by this paper
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
Improved Multimodal Deep Learning with Variation of Information
2014cited by this paper
Evaluation of output embeddings for fine-grained image classification
2014influential reference
Conditional Generative Adversarial Nets
2014cited by this paper
Microsoft COCO: Common Objects in Context
2014influential reference
Learning to generate chairs with convolutional neural networks
2014cited by this paper
Rectifier Nonlinearities Improve Neural Network Acoustic Models
2013cited by this paper
Better Mixing via Deep Representations
2012cited by this paper
Multimodal learning with deep Boltzmann machines
2012cited by this paper
Multimodal Deep Learning
2011cited by this paper
The Caltech-UCSD Birds-200-2011 Dataset
2011cited by this paper
Relative attributes
2011influential reference
Caltech-UCSD Birds 200
2010cited by this paper
Attribute and simile classifiers for face verification
2009cited by this paper
Describing objects by their attributes
2009cited by this paper
The MIR flickr retrieval evaluation
2008cited by this paper
Long Short-Term Memory
1997cited by this paper
Attribute-Based Classification for Zero-Shot Visual Object Categorization
year unknowncited by this paper

CITED BY

MATdiff: Mask-aware transformer with diffusion model for large-mask image inpainting
2026cites this paper
Generalization bounds for a generator-regularized InfoGAN-inspired adversarial objective.
2026cites this paper
An analytical review of GANs: technical evolution, architectures, applications, datasets, and challenges
2026cites this paper
Text-to-image generation with enhanced GANs: Bridging semantic gaps using RNN and CNN
2026cites this paper
Controllable image-Guided generation via dynamic gaussian spectral modulation
2026cites this paper
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images
2026cites this paper
Generate Corresponding Image from Text Description Using Modified GAN-CLS Algorithm
2026cites this paper
Deep Generative Models for Node Embedding and Neighborhood Prediction in Dynamic Graphs of Recommendation Systems
2026cites this paper
Voice2Visage: Deciphering Faces From Voices
2026cites this paper
Deepfake detection using a distinctive eye signature and the entropy heat map of the image texture
2026cites this paper
Dual-encoder semantics and hierarchical identity refinement for personalized image generation
2026cites this paper
DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion
2026cites this paper
Key-Value Mapping-Based Text-to-Image Diffusion Model Backdoor Attacks
2026cites this paper
Object-level semantic alignment for enhancing fidelity in text-to-image generation with diffusion models
2026cites this paper
PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography
2026cites this paper
Leveraging GANs and Vision Transformers for Text-to-Image Synthesis
2026cites this paper
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
2026cites this paper
TFIGF: Fire data augmentation model based on text-to-image synthesis
2026cites this paper
Neural-Enhanced Modulation for Spatial Selective Transmission on Low-End IoT Devices
2026cites this paper
A review of instruction-guided image editing
2026cites this paper
SAST: Semantic-Aware stylized Text-to-Image generation
2026cites this paper
Forget Less by Learning Together through Concept Consolidation
2026cites this paper
Image Verse AI- Crafting Image Worlds from Text
2025cites this paper
A deep learning approach for music visualization: From audio features to descriptive video generation
2025cites this paper
Draw What You Hear: High-Fidelity Image Generation and Manipulation via SoundAdapter
2025cites this paper
User-Driven Customization in 3D Generation: Improving Stable Fast 3D with Inpainting Methods
2025cites this paper
ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
2025cites this paper
Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity
2025cites this paper
Private Training & Data Generation by Clustering Embeddings
2025cites this paper
TexStFusion : a controllable diffusion model using textural, structural, and textual feature fusion
2025cites this paper
CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks
2025cites this paper
RogueGPT: transforming ChatGPT-4 into a rogue AI with dis-ethical tuning
2025cites this paper
Text2Image: Generating Visuals from Words using Deep Learning
2025cites this paper
Stacked deep fusion GAN for enhanced text-to-image generation
2025cites this paper
Quantum generative adversarial network for image generation
2025cites this paper
Dynamic local affine transformation for enhanced text-to-image generation with GANs
2025cites this paper
Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
2025cites this paper
Integrating Speech-to-Text for Image Generation Using Generative Adversarial Networks
2025cites this paper
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
2025cites this paper
Efficient Text-Guided 3D-Aware Generation With Score Distillation on 3D Distribution
2025cites this paper
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
2025cites this paper
ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints
2025cites this paper
RaT2IGen: Relation-aware Text-to-image Generation via Learnable Prompt
2025cites this paper
A novel flood forecasting model based on TimeGAN for data-sparse basins
2025cites this paper
Tumor Synthesis Conditioned on Radiomics
2025cites this paper
Examining the role of compression in influencing AI-generated image authenticity
2025cites this paper
GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security
2025cites this paper
Probability Density Function Distance-Based Augmented CycleGAN for Image Domain Translation with Asymmetric Sample Size
2025cites this paper
CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion
2025cites this paper
Facial Expression Generation from Text with FaceCLIP
2025cites this paper
Cross-Age Face Verification Using Generative Adversarial Networks (GAN) with Landmark Feature
2025cites this paper
Exploring denoising diffusion models for compressible fluid field prediction
2025cites this paper
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
2025cites this paper
Comparison of anatomy image generation capability in AI image generation models
2025cites this paper
Optimizing low-resource language encoders for text-to-image generation: a case study on Thai
2025cites this paper
HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment
2025cites this paper
EO2IR ControlNet: synthetic infrared image generation for automatic target recognition: experimental results in MIST
2025cites this paper
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
2025cites this paper
Textual Prompt Interpretation for Image Synthesis Using Generative AI Techniques
2025cites this paper
When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class
2025cites this paper
Text-conditioned image generation using diffusion models
2025cites this paper
SFCM-AEG: Source-Free Cross-Modal Adversarial Example Generation
2025cites this paper
Explainable AI and deep learning models for recommender systems: State of the art and challenges
2025cites this paper
DiffDesign: A diffusion model using garment Knowledge-Enhanced for Fashion Design Synthesis
2025cites this paper
Attention-Based Synthetic Data Generation for Calibration-Enhanced Survival Analysis: A Case Study for Chronic Kidney Disease Using Electronic Health Records
2025cites this paper
Semantic-aware Mapping for Text-to-Image Synthesis
2025cites this paper
CoSimGen: Controllable Diffusion Model for Simultaneous Image and Mask Generation
2025cites this paper
TrapNet: Model Inversion Defense via Trapdoor
2025cites this paper
EigenActor: Variant Body-Object Interaction Generation Evolved from Invariant Action Basis Reasoning
2025cites this paper
KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models
2025cites this paper
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
2025cites this paper
The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes
2025cites this paper
End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings
2025cites this paper
AI-assisted Real-Time Spatial Delphi: integrating artificial intelligence models for advancing future scenarios analysis
2025cites this paper
Quantitative Analysis of Blood Cell Components and Detection of Malarial Parasite (P.Vivax) using Faster R-CNN
2025cites this paper
Topic Videolization: A Rumor Detection Method Inspired by Video Forgery Detection Technology
2025cites this paper
Exploring text-to-image generation models: Applications and cloud resource utilization
2025cites this paper
Semantic structure preservation for accurate multi-modal glioma diagnosis
2025cites this paper
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
2025cites this paper
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
2025cites this paper
SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
2025cites this paper
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
2025cites this paper
STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation
2025cites this paper
PBR-Inspired Controllable Diffusion for Image Generation
2025cites this paper
Brain-Supervised Conditional Generative Modeling
2025cites this paper
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
2025cites this paper
ASMR: Augmenting Life Scenario using Large Generative Models for Robotic Action Reflection
2025cites this paper
Deep Reinforcement Learning-based Automatic Augmentation for Gastrointestinal Disease Classification
2025cites this paper
Research on adversarial identification methods for AI‐generated image software Craiyon V3
2025cites this paper
FreeInv: Free Lunch for Improving DDIM Inversion
2025cites this paper
ICE: Interactive 3D Game Character Facial Editing via Dialogue
2025cites this paper
Diffusion Models and Generative Artificial Intelligence: Frameworks, Applications and Challenges
2025cites this paper
PartStickers: Generating Parts of Objects for Rapid Prototyping
2025cites this paper
MDVL-Edit: Mask-assisted highly disentangled text-driven face image editing based on vision-language alignment
2025cites this paper
Leveraging human-centred AI in interactive English education for generating learning materials
2025cites this paper
Conditional Feature Generative Adversarial Network for Fault Diagnosis of Axial Piston Pump
2025cites this paper
PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling
2025cites this paper
CTD-inpainting: Towards the Coherence of Text-driven Inpainting with Blended Diffusion
2025cites this paper
CookGALIP: Recipe Controllable Generative Adversarial CLIPs With Sequential Ingredient Prompts for Food Image Generation
2025cites this paper
Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?
2025cites this paper