Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis

Jimei Yang,Scott E. Reed,Ming-Hsuan Yang,Honglak Lee

Published 2015 in Neural Information Processing Systems

ABSTRACT

An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

PUBLICATION RECORD

Publication year
2015
Venue
Neural Information Processing Systems
Publication date
2015-12-07
Fields of study
Computer Science
Identifiers
arXiv 1601.00706
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Fast R-CNN
2015cited by this paper
Deep Convolutional Inverse Graphics Network
2015influential reference
High-fidelity Pose and Expression Normalization for face recognition in the wild
2015cited by this paper
Deep Stereo: Learning to Predict New Views from the World's Imagery
2015cited by this paper
Understanding Deep Features with Computer-Generated Imagery
2015cited by this paper
Action-Conditional Video Prediction using Deep Networks in Atari Games
2015cited by this paper
Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells"
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations
2014cited by this paper
Learning to Execute
2014cited by this paper
3D object manipulation in a single photograph using stock 3D models
2014cited by this paper
Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Learning to Disentangle Factors of Variation with Manifold Interaction
2014cited by this paper
Optimizing Neural Networks that Generate Iimages
2014cited by this paper
Learning to generate chairs with convolutional neural networks
2014influential reference
"Mental Rotation" by Optimizing Transforming Distance
2014cited by this paper
Show and tell: A neural image caption generator
2014cited by this paper
Fully convolutional networks for semantic segmentation
2014cited by this paper
Caffe: Convolutional Architecture for Fast Feature Embedding
2014cited by this paper
Discovering Hidden Factors of Variation in Deep Networks
2014cited by this paper
Auto-Encoding Variational Bayes
2013cited by this paper
Playing Atari with Deep Reinforcement Learning
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model
2012cited by this paper
Transforming Auto-Encoders
2011cited by this paper
Curriculum learning
2009cited by this paper
Multi-PIE
2008influential reference
Separating Style and Content with Bilinear Models
2000cited by this paper
A Morphable Model For The Synthesis Of 3D Faces
1999cited by this paper
Mental Rotation of Three-Dimensional Objects
1971cited by this paper

CITED BY

Unsupervised Multivariate Time Series Anomaly Detection by Feature Decoupling in Federated Learning Scenarios
2025cites this paper
Spatial Mask-Based Adaptive Robust Training for Video Object Segmentation With Noisy Labels
2025cites this paper
A pure MLP-Mixer-based GAN framework for guided image translation
2025cites this paper
RegCGAN: Resampling with Regularized CGAN for Imbalanced Big Data Problem
2025cites this paper
Multi-Stage Statistical Texture-Guided GAN for Tilted Face Frontalization
2025cites this paper
FTASD - A Fine Tuning Approach for Stable Diffusion Models
2024cites this paper
Towards Controllable Time Series Generation
2024cites this paper
Identity-Aware Variational Autoencoder for Face Swapping
2024cites this paper
Level of agreement between emotions generated by Artificial Intelligence and human evaluation: a methodological proposal
2024cites this paper
A Virtual View Acquisition Technique for Complex Scenes of Monocular Images Based on Layered Depth Images
2024cites this paper
Large Pose Face Recognition via Facial Representation Learning
2024cites this paper
Geometry-biased Transformers for Novel View Synthesis
2023cites this paper
STATE: Learning structure and texture representations for novel view synthesis
2023cites this paper
SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution
2023cites this paper
Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields
2023cites this paper
Learning invariant and uniformly distributed feature space for multi-view generation
2023cites this paper
Learning a High Fidelity Identity Representation for Face Frontalization
2023cites this paper
GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
2023cites this paper
A study on Generative Adversarial Text to Image Synthesis
2023cites this paper
Non-linear integration of loss terms for improved new view synthesis
2023cites this paper
A review of disentangled representation learning for visual data processing and analysis
2023cites this paper
Learning Latent Image Representations with Prior Knowledge
2022cites this paper
Pose Attention-Guided Profile-to-Frontal Face Recognition
2022cites this paper
Embodied Affordance Grounding using Semantic Simulationsand Neural-Symbolic Reasoning: An Overview of the PlayGround Project
2022cites this paper
Learning Flow-Based Disentanglement
2022cites this paper
CIGMO: Categorical invariant representations in a deep generative framework
2022cites this paper
β-CapsNet: learning disentangled representation for CapsNet by information bottleneck
2022cites this paper
Fully Deformable Network for Multiview Face Image Synthesis
2022cites this paper
Decoupling Local and Global Representations of Time Series
2022cites this paper
Cross-View Panorama Image Synthesis
2022cites this paper
2D facial landmark localization method for multi-view face synthesis image using a two-pathway generative adversarial network approach
2022cites this paper
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
2022cites this paper
Symmetry-Based Representations for Artificial and Biological General Intelligence
2022cites this paper
SAGA-net
2022cites this paper
Transferring human motion and appearance in monocular videos
2022cites this paper
SAGA-Net: efficient pointcloud completion with shape-assisted graph attention neural network
2022cites this paper
PI-Trans: Parallel-Convmlp and Implicit-Transformation Based Gan for Cross-View Image Translation
2022cites this paper
Frontal Face Generation Based Multi-angle Face Identification System
2021cites this paper
Unsupervised Novel View Synthesis from a Single Image
2021cites this paper
Multi-view face generation via unpaired images
2021cites this paper
Detailed Feature Guided Generative Adversarial Pose Reconstruction Network
2021cites this paper
Semi-Supervised Face Frontalization in the Wild
2021cites this paper
Towards Out-Of-Distribution Generalization: A Survey
2021cites this paper
Profile to Frontal Face Recognition in the Wild Using Coupled Conditional GAN
2021cites this paper
A Shape-Aware Retargeting Approach to Transfer Human Motion and Appearance in Monocular Videos
2021cites this paper
Fine-Grained Semantic Image Synthesis with Object-Attention Generative Adversarial Network
2021cites this paper
Quantised Transforming Auto-Encoders: Achieving Equivariance to Arbitrary Transformations in Deep Networks
2021cites this paper
BGDisp-ResNet: A Robust Disparity Estimation and View Synthesis Pipeline Integrating with Bilateral 3D Grid Features in Deep Residual Networks for Light Field Cameras
2021cites this paper
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
2021cites this paper
Shape transformer nets: Generating viewpoint-invariant 3D shapes from a single image
2021cites this paper
Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS
2021cites this paper
Shape transformer nets: Generating viewpoint-invariant 3D shapes from a single image
2021cites this paper
FACE RECOGNITION BASED ON FRONTALIZATION OF MULTIPLE POSES USING G-GAN AND DWT
2021cites this paper
Learning joint latent representations based on information maximization
2021cites this paper
Infrared Target Recognition Using Realistic Training Images Generated by Modifying Latent Features of an Encoder–Decoder Network
2021cites this paper
DisUnknown: Distilling Unknown Factors for Disentanglement Learning
2021cites this paper
On Development and Evaluation of Retargeting Human Motion and Appearance in Monocular Videos
2021cites this paper
Geometry-Free View Synthesis: Transformers and no 3D Priors
2021cites this paper
Sparse Pose Trajectory Completion
2021cites this paper
FA-GAN: Face Augmentation GAN for Deformation-Invariant Face Recognition
2021cites this paper
Novel View Synthesis via Depth-guided Skip Connections
2021cites this paper
Multi-view 3D shape style transformation
2021cites this paper
There and back again: Cycle consistency across sets for isolating factors of variation
2021cites this paper
Robust 2D/3D Vehicle Parsing in CVIS
2021cites this paper
Disentangled Representation Learning and Its Application to Face Analytics
2021cites this paper
Facial UV map completion for pose-invariant face recognition: a novel adversarial approach based on coupled attention residual UNets
2020cites this paper
Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video
2020cites this paper
A Novel Generative Model to Synthesize Face Images for Pose-invariant Face Recognition
2020cites this paper
Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision
2020cites this paper
Deinterleaving of Pulse Streams With Denoising Autoencoders
2020cites this paper
A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation
2020cites this paper
Face recognition: Past, present and future (a review)
2020cites this paper
Three-view generation based on a single front view image for car
2020cites this paper
A Generative Modelling Technique for 3D Reconstruction from a Single 2D Image
2020cites this paper
DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning
2020cites this paper
Synthesizing light field from a single image with variable MPI and two network fusion
2020cites this paper
Challenging β-VAE with β<1 for Disentanglement Via Dynamic Learning
2020cites this paper
The Elements of End-to-end Deep Face Recognition: A Survey of Recent Advances
2020cites this paper
Metrics for Exposing the Biases of Content-Style Disentanglement
2020cites this paper
Rotationally-Consistent Novel View Synthesis for Humans
2020cites this paper
Novel View Synthesis With Skip Connections
2020cites this paper
Unsupervised Continuous Object Representation Networks for Novel View Synthesis
2020cites this paper
Space-Time Memory Networks for Video Object Segmentation With User Guidance
2020cites this paper
Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism
2020cites this paper
A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation
2020cites this paper
Future Urban Scenes Generation Through Vehicles Synthesis
2020cites this paper
Feature-Improving Generative Adversarial Network for Face Frontalization
2020cites this paper
A Neural-Symbolic Framework for Mental Simulation
2020cites this paper
LFGAN
2020cites this paper
End-to-End Text-to-Image Synthesis with Spatial Constrains
2020cites this paper
Pose-Based View Synthesis for Vehicles: A Perspective Aware Method
2020cites this paper
Novel Object Viewpoint Estimation Through Reconstruction Alignment
2020cites this paper
Capture, Reconstruction, and Representation of the Visual Real World for Virtual Reality
2020cites this paper
Attribute-based regularization of latent spaces for variational auto-encoders
2020influential citation
Denoising of Radar Pulse Streams With Autoencoders
2020cites this paper
Attribute-based Regularization of VAE Latent Spaces
2020cites this paper
Ordinal-Content VAE: Isolating Ordinal-Valued Content Factors in Deep Latent Variable Models
2020cites this paper
SELF-SUPERVISED SCENE REPRESENTATION LEARNING A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
2020cites this paper
Weakly-Supervised Disentanglement Without Compromises
2020cites this paper
Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation
2020cites this paper