Latent Translation: Crossing Modalities by Bridging Generative Models

Published 2019 in arXiv.org

ABSTRACT

End-to-end optimization has achieved state-of-the-art performance on many specific problems, but there is no straight-forward way to combine pretrained models for new problems. Here, we explore improving modularity by learning a post-hoc interface between two existing models to solve a new task. Specifically, we take inspiration from neural machine translation, and cast the challenging problem of cross-modal domain transfer as unsupervised translation between the latent spaces of pretrained deep generative models. By abstracting away the data representation, we demonstrate that it is possible to transfer across different modalities (e.g., image-to-audio) and even different types of generative models (e.g., VAE-to-GAN). We compare to state-of-the-art techniques and find that a straight-forward variational autoencoder is able to best bridge the two generative models through learning a shared latent space. We can further impose supervised alignment of attributes in both domains with a classifier in the shared latent space. Through qualitative and quantitative evaluations, we demonstrate that locality and semantic alignment are preserved through the transfer process, as indicated by high transfer accuracies and smooth interpolations within a class. Finally, we show this modular structure speeds up training of new interface models by several orders of magnitude by decoupling it from expensive retraining of base generative models.

PUBLICATION RECORD

Publication year
2019
Venue
arXiv.org
Publication date
2019-02-21
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1902.08261
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Generative Adversarial Networks
2021influential reference
5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
2020cited by this paper
AUTO-ENCODING VARIATIONAL BAYES
2020influential reference
GANSynth: Adversarial Neural Audio Synthesis
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Twin-GAN - Unpaired Cross-Domain Image Translation with Weight-Sharing GANs
2018cited by this paper
Video-to-Video Synthesis
2018cited by this paper
A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
2018cited by this paper
A Universal Music Translation Network
2018cited by this paper
Demystifying MMD GANs
2018cited by this paper
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
2018influential reference
Generative Modeling Using the Sliced Wasserstein Distance
2018cited by this paper
Adversarial Audio Synthesis
2018influential reference
Phrase-Based & Neural Unsupervised Machine Translation
2018cited by this paper
Large Scale GAN Training for High Fidelity Natural Image Synthesis
2018cited by this paper
Wasserstein GAN
2017cited by this paper
Towards Diverse and Natural Image Descriptions via a Conditional GAN
2017cited by this paper
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
2017cited by this paper
Word Translation Without Parallel Data
2017cited by this paper
Unsupervised Neural Machine Translation
2017cited by this paper
Progressive Growing of GANs for Improved Quality, Stability, and Variation
2017cited by this paper
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
2017cited by this paper
Unsupervised Image-to-Image Translation Networks
2017cited by this paper
MMD GAN: Towards Deeper Understanding of Moment Matching Network
2017cited by this paper
Improved Training of Wasserstein GANs
2017cited by this paper
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
2017cited by this paper
CycleGAN, a Master of Steganography
2017cited by this paper
Adversarial Training for Unsupervised Bilingual Lexicon Induction
2017cited by this paper
Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models
2017cited by this paper
Generative Adversarial Text to Image Synthesis
2016cited by this paper
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
2016cited by this paper
Coupled Generative Adversarial Networks
2016cited by this paper
Image-to-Image Translation with Conditional Adversarial Networks
2016influential reference
Unsupervised Cross-Domain Image Generation
2016cited by this paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2015cited by this paper
Fully convolutional networks for semantic segmentation
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Sliced and Radon Wasserstein Barycenters of Measures
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Conditional Generative Adversarial Nets
2014cited by this paper
Deep Learning Face Attributes in the Wild
2014cited by this paper
The mnist database of handwritten digits
2005cited by this paper

CITED BY

MultiVae: A Python package for Multimodal Variational Autoencoders on Partial Datasets
2025cites this paper
Harnessing the Universal Geometry of Embeddings
2025cites this paper
Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation
2025influential citation
Controllable Data Generation by Deep Learning: A Review
2022cites this paper
A survey of multimodal deep generative models
2022cites this paper
Learning Audio-Visual Correlations From Variational Cross-Modal Generation
2021cites this paper
Self-supervised Disentanglement of Modality-Specific and Shared Factors Improves Multimodal Generative Models
2021cites this paper
Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts
2021cites this paper
Generalized Multimodal ELBO
2021cites this paper
Multimodal Few-Shot Learning with Frozen Language Models
2021cites this paper
Face-to-Music Translation Using a Distance-Preserving Generative Adversarial Network with an Auxiliary Discriminator
2020cites this paper
A VAE Conversion Method for Heterogeneous Data Inputs to Create Uniform Outputs for Diagnosis
2020cites this paper
AudioViewer: Learning to Visualize Sounds
2020cites this paper
AudioViewer: Learning to Visualize Sound
2020cites this paper
Deep Learning and the Global Workspace Theory
2020cites this paper
Optimal Unsupervised Domain Translation
2019cites this paper
Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
2019cites this paper
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
2019cites this paper