Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs

Nicholas Watters,L. Matthey,Christopher P. Burgess,Alexander Lerchner

Published 2019 in arXiv.org

ABSTRACT

We present a simple neural rendering architecture that helps variational autoencoders (VAEs) learn disentangled representations. Instead of the deconvolutional network typically used in the decoder of VAEs, we tile (broadcast) the latent vector across space, concatenate fixed X- and Y-"coordinate" channels, and apply a fully convolutional network with 1x1 stride. This provides an architectural prior for dissociating positional from non-positional features in the latent distribution of VAEs, yet without providing any explicit supervision to this effect. We show that this architecture, which we term the Spatial Broadcast decoder, improves disentangling, reconstruction accuracy, and generalization to held-out regions in data space. It provides a particularly dramatic benefit when applied to datasets with small objects. We also emphasize a method for visualizing learned latent spaces that helped us diagnose our models and may prove useful for others aiming to assess data representations. Finally, we show the Spatial Broadcast Decoder is complementary to state-of-the-art (SOTA) disentangling techniques and when incorporated improves their performance.

PUBLICATION RECORD

Publication year
2019
Venue
arXiv.org
Publication date
2019-01-21
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1901.07017
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Building Machines that Learn and Think Like People
2018cited by this paper
A Framework for the Quantitative Evaluation of Disentangled Representations
2018cited by this paper
Learning Deep Disentangled Embeddings with the F-Statistic Loss
2018cited by this paper
Understanding disentangling in β-VAE
2018influential reference
Towards a Definition of Disentangled Representations
2018cited by this paper
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
2018influential reference
The Multi-Entity Variational Autoencoder
2018cited by this paper
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
2018influential reference
Neural scene representation and rendering
2018cited by this paper
Isolating Sources of Disentanglement in Variational Autoencoders
2018influential reference
Deep Learning: A Critical Appraisal
2018cited by this paper
Image Transformer
2018cited by this paper
Disentangling by Factorising
2018influential reference
SCAN: Learning Abstract Hierarchical Compositional Visual Concepts
2017influential reference
Visual Interaction Networks: Learning a Physics Simulator from Video
2017cited by this paper
FiLM: Visual Reasoning with a General Conditioning Layer
2017cited by this paper
Attention is All you Need
2017cited by this paper
Convolutional Sequence to Sequence Learning
2017cited by this paper
Deep Image Prior
2017cited by this paper
Laplacian Pyramid of Conditional Variational Autoencoders
2017cited by this paper
Attention-Based Extraction of Structured Information from Street View Imagery
2017cited by this paper
An Information-Theoretic Analysis of Deep Latent-Variable Models
2017cited by this paper
Incorporating Copying Mechanism in Sequence-to-Sequence Learning
2016cited by this paper
Learning What and Where to Draw
2016cited by this paper
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
2016cited by this paper
Deconvolution and Checkerboard Artifacts
2016cited by this paper
Proposal-Free Network for Instance-Level Object Segmentation
2015cited by this paper
Stacked What-Where Auto-encoders
2015cited by this paper
End-to-End Training of Deep Visuomotor Policies
2015cited by this paper
Spatial Transformer Networks
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders
2015cited by this paper
DRAW: A Recurrent Neural Network For Image Generation
2015cited by this paper
Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
2014cited by this paper
Auto-Encoding Variational Bayes
2013cited by this paper
Representation Learning: A Review and New Perspectives
2012cited by this paper

CITED BY

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
2026cites this paper
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
2025cites this paper
Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs
2025cites this paper
Slot-BERT: Self-supervised Object Discovery in Surgical Video
2025cites this paper
Independent Density Estimation
2025cites this paper
Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance
2025cites this paper
Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video
2025cites this paper
Object-Centric World Models for Causality-Aware Reinforcement Learning
2025cites this paper
Disentangled Representation Learning via Modular Compositional Bias
2025cites this paper
Push, See, Predict: Emergent Perception Through Intrinsically Motivated Play
2025cites this paper
Efficient Object-Centric Representation Learning using Masked Generative Modeling
2025cites this paper
Farm-Level, In-Season Crop Identification for India
2025cites this paper
Future Slot Prediction for Unsupervised Object Discovery in Surgical Video
2025cites this paper
CoLa: Chinese Character Decomposition with Compositional Latent Components
2025cites this paper
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
2025cites this paper
An Interpretable Representation Learning Approach for Diffusion Tensor Imaging
2025cites this paper
Learning to Adapt to Position Bias in Vision Transformer Classifiers
2025cites this paper
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
2025cites this paper
Object-Centric World Model for Language-Guided Manipulation
2025cites this paper
Vector-Quantized Vision Foundation Models for Object-Centric Learning
2025cites this paper
TextOCVP: Object-Centric Video Prediction with Language Guidance
2025influential citation
Unsupervised Object Discovery: A Comprehensive Survey and Unified Taxonomy
2024cites this paper
FACTS: A Factored State-Space Framework For World Modelling
2024cites this paper
Disentangling genotype and environment specific latent features for improved trait prediction using a compositional autoencoder
2024cites this paper
Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases
2024cites this paper
An Attentive Approach for Building Partial Reasoning Agents from Pixels
2024cites this paper
Simplified priors for Object-Centric Learning
2024cites this paper
Transferring disentangled representations: bridging the gap between synthetic and real images
2024cites this paper
Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI
2024cites this paper
PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
2024cites this paper
Rethinking Disentanglement under Dependent Factors of Variation
2024cites this paper
Learning Object-Centric Representation via Reverse Hierarchy Guidance
2024cites this paper
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
2024cites this paper
ViPro: Enabling and Controlling Video Prediction for Complex Dynamical Scenarios using Procedural Knowledge
2024cites this paper
Temporally Consistent Object-Centric Learning by Contrasting Slots
2024cites this paper
Attention Normalization Impacts Cardinality Generalization in Slot Attention
2024cites this paper
Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers
2024influential citation
Slot State Space Models
2024cites this paper
Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention
2024cites this paper
FMRI Data Analysis Preserving Map Variability Via Unsupervised Object-Centric Learning
2024cites this paper
Teeth Alignment Prediction Using U-Net
2024cites this paper
Masked Multi-Query Slot Attention for Unsupervised Object Discovery
2024cites this paper
Reasoning-Enhanced Object-Centric Learning for Videos
2024cites this paper
Toward Improving the Generation Quality of Autoregressive Slot VAEs
2024cites this paper
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
2024cites this paper
Advantages of Modeling Photoplethysmography (PPG) Signals using Variational Autoencoders
2024cites this paper
Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach
2024cites this paper
CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse
2024cites this paper
Neural Language of Thought Models
2024cites this paper
Explicitly Disentangled Representations in Object-Centric Learning
2024cites this paper
Slot-Based Object-Centric Reinforcement Learning Algorithm
2024cites this paper
Deep-Learning-Based Morphological Feature Segmentation for Facial Skin Image Analysis
2023cites this paper
Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior
2023cites this paper
Systematic Visual Reasoning through Object-Centric Relational Abstraction
2023cites this paper
Cycle Consistency Driven Object Discovery
2023cites this paper
Slot-VAE: Object-Centric Scene Generation with Slot Attention
2023cites this paper
Triggering dark showers with conditional dual auto-encoders
2023cites this paper
DORSal: Diffusion for Object-centric Representations of Scenes et al
2023cites this paper
Interaction-Based Disentanglement of Entities for Object-Centric World Models
2023influential citation
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
2023cites this paper
Linking vision and motion for self-supervised object-centric perception
2023cites this paper
Towards Interpretable Controllability in Object-Centric Learning
2023cites this paper
Object-Centric Semantic Vector Quantization
2023cites this paper
Learning Latent Structural Relations with Message Passing Prior
2023cites this paper
Deep variational Luenberger-type observer with dynamic objects channel-attention for stochastic video prediction
2023cites this paper
Guiding Video Prediction with Explicit Procedural Knowledge
2023cites this paper
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-World Single Images
2023cites this paper
SODA: Bottleneck Diffusion Models for Representation Learning
2023cites this paper
Unsupervised Musical Object Discovery from Audio
2023cites this paper
Object-Centric Learning with Slot Mixture Module
2023cites this paper
Neurosymbolic Grounding for Compositional World Models
2023cites this paper
Leveraging Image Augmentation for Object Manipulation: Towards Interpretable Controllability in Object-Centric Learning
2023cites this paper
Self-supervised Object-Centric Learning for Videos
2023cites this paper
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
2023cites this paper
NeuPPS: Neural Piecewise Parametric Surfaces
2023cites this paper
Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots
2023cites this paper
Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
2023cites this paper
Learning Disentangled Discrete Representations
2023influential citation
Learning to reason over visual objects
2023cites this paper
Learning global spatial information for multi-view object-centric models
2023cites this paper
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
2023cites this paper
Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots
2023cites this paper
TC-VAE: Uncovering Out-of-Distribution Data Generative Factors
2023cites this paper
Audioslots: A Slot-Centric Generative Model For Audio Separation
2023cites this paper
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
2023cites this paper
Contrastive Training of Complex-Valued Autoencoders for Object Discovery
2023cites this paper
Sensitivity of Slot-Based Object-Centric Models to their Number of Slots
2023cites this paper
Improving Object-centric Learning with Query Optimization
2022cites this paper
Disentangling Domain and Content
2022cites this paper
Compositional Scene Representation Learning via Reconstruction: A Survey
2022cites this paper
Sparse capsule networks for informative representation learning in digital pathology
2022cites this paper
Unsupervised Learning of Temporal Abstractions With Slot-Based Transformers
2022cites this paper
Lost in Latent Space: Disentangled Models and the Challenge of Combinatorial Generalisation
2022influential citation
O BJECT - CENTRIC C OMPOSITIONAL I MAGINATION FOR V ISUAL A BSTRACT R EASONING
2022cites this paper
Slot Order Matters for Compositional Scene Understanding
2022cites this paper
Object Scene Representation Transformer
2022cites this paper
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation
2022cites this paper
Intuitive physics learning in a deep-learning model inspired by developmental psychology
2022cites this paper
Sparse Relational Reasoning with Object-Centric Representations
2022cites this paper
Low Level Feature Extraction for Cilia Segmentation
2022cites this paper