Spherical Latent Spaces for Stable Variational Autoencoders

Published 2018 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

A hallmark of variational autoencoders (VAEs) for text processing is their combination of powerful encoder-decoder models, such as LSTMs, with simple latent distributions, typically multivariate Gaussians. These models pose a difficult optimization problem: there is an especially bad local optimum where the variational posterior always equals the prior and the model does not use the latent variable at all, a kind of “collapse” which is encouraged by the KL divergence term of the objective. In this work, we experiment with another choice of latent distribution, namely the von Mises-Fisher (vMF) distribution, which places mass on the surface of the unit hypersphere. With this choice of prior and posterior, the KL divergence term now only depends on the variance of the vMF distribution, giving us the ability to treat it as a fixed hyperparameter. We show that doing so not only averts the KL collapse, but consistently gives better likelihoods than Gaussians across a range of modeling conditions, including recurrent language modeling and bag-of-words document modeling. An analysis of the properties of our vMF representations shows that they learn richer and more nuanced structures in their latent representations than their Gaussian counterparts.

PUBLICATION RECORD

Publication year
2018
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2018-08-31
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.18653/v1/D18-1480 arXiv 1808.10805
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Hyperspherical Variational Auto-Encoders
2018cited by this paper
Adversarially Regularized Autoencoders
2017influential reference
Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
2017influential reference
Variational Autoencoder for Semi-Supervised Text Classification
2017cited by this paper
Generating Sentences by Editing Prototypes
2017influential reference
Fixing a Broken ELBO
2017cited by this paper
A Hybrid Convolutional Variational Autoencoder for Text Generation
2017cited by this paper
Style Transfer from Non-Parallel Text by Cross-Alignment
2017cited by this paper
InfoVAE: Information Maximizing Variational Autoencoders
2017influential reference
Attention is All you Need
2017cited by this paper
Toward Controlled Generation of Text
2017influential reference
Piecewise Latent Variables for Neural Variational Text Processing
2017influential reference
von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification
2017cited by this paper
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
2016influential reference
Pixel Recurrent Neural Networks
2016cited by this paper
Variational Neural Machine Translation
2016cited by this paper
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
2016cited by this paper
Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification
2016cited by this paper
Variational Lossy Autoencoder
2016cited by this paper
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
2016cited by this paper
WaveNet: A Generative Model for Raw Audio
2016cited by this paper
Neural Variational Inference for Text Processing
2015influential reference
Generating Sentences from a Continuous Space
2015influential reference
A Recurrent Latent Variable Model for Sequential Data
2015cited by this paper
DRAW: A Recurrent Neural Network For Image Generation
2015cited by this paper
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
2014cited by this paper
Neural Variational Inference and Learning in Belief Networks
2014cited by this paper
Auto-Encoding Variational Bayes
2013influential reference
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
2005cited by this paper
Long Short-Term Memory
1997influential reference
Simulation of the von mises fisher distribution
1994cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper

CITED BY

Image Generation with a Sphere Encoder
2026cites this paper
Dual Generative Adversarial Graph Networks: Unsupervised and Semi-Supervised Learning with Spherical Graph Embeddings
2026cites this paper
Adaptive deep clustering via disentangled hyperspherical VAEs with contrastive geometry optimization
2026cites this paper
Neural topic modeling on hyperspheres: Spherical representation learning with von Mises-Fisher mixtures.
2026cites this paper
Spherical Tree-Sliced Wasserstein Distance
2025cites this paper
Improving Alzheimer's disease diagnosis by hyperspherical weighted adversarial learning in open set domain adaptation
2025cites this paper
On the design and evaluation of generative models in high energy density physics
2025cites this paper
Learning to Identify Seen, Unseen and Unknown in the Open World: A Practical Setting for Zero-Shot Learning
2025cites this paper
Riemannian generative decoder
2025cites this paper
Contrastive Self-Supervised Learning As Neural Manifold Packing
2025cites this paper
Contrastive Learning in NMPC Behavior Cloning for Generalized Trajectory Tracking of Marine Vehicles
2025cites this paper
Clustering-based brain functional segmentation via deep collapsed nonparametric von Mises-Fisher mixture models
2025cites this paper
Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters
2025cites this paper
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
2025cites this paper
Latent Thought Models with Variational Bayes Inference-Time Computation
2025cites this paper
Learning symmetries and non-Euclidean data representations via collective dynamics of generalized Kuramoto oscillators
2025cites this paper
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
2025cites this paper
Learning over von Mises–Fisher distributions via a Wasserstein-like geometry
2025cites this paper
SCode: A Spherical Code Metric Learning Approach to Continuously Monitoring Predictive Events in Networked Data
2025cites this paper
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
2025cites this paper
Topic modeling and alignment with large language models for multi-labeled text corpora
2025cites this paper
Latent Diffusion Model without Variational Autoencoder
2025cites this paper
The Evolution of Generative AI: Trends and Applications
2025cites this paper
ConceptVQ: Visual Model Interpreter
2025cites this paper
Hyperbolic Geometric Latent Diffusion Model for Graph Generation
2024cites this paper
Towards Better Spherical Sliced-Wasserstein Distance Learning with Data-Adaptive Discriminative Projection Direction
2024cites this paper
Probabilistic Contrastive Learning with Explicit Concentration on the Hypersphere
2024cites this paper
AutoExplorers: Autoencoder-Based Strategies for High-Entropy Exploration in Unknown Environments for Mobile Robots
2024cites this paper
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
2024cites this paper
Hyper-Spherical Optimal Transport for Semantic Alignment in Text-to-3D End-to-End Generation
2024cites this paper
A Coding-Theoretic Analysis of Hyperspherical Prototypical Learning Geometry
2024cites this paper
IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning
2024cites this paper
Improving Data-aware and Parameter-aware Robustness for Continual Learning
2024cites this paper
A solution for the mean parametrization of the von Mises-Fisher distribution
2024cites this paper
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
2024cites this paper
HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation
2024cites this paper
Learning Latent Graph Structures and their Uncertainty
2024cites this paper
QVAE-Mole: The Quantum VAE with Spherical Latent Variable Learning for 3-D Molecule Generation
2024cites this paper
Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions
2024cites this paper
A Simplified Framework for Contrastive Learning for Node Representations
2023cites this paper
CALM: Conditional Adversarial Latent Models for Directable Virtual Characters
2023cites this paper
Towards Polymorphic Adversarial Examples Generation for Short Text
2023cites this paper
One-class learning for fake news detection through multimodal variational autoencoders
2023cites this paper
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
2023cites this paper
vONTSS: vMF based semi-supervised neural topic modeling with optimal transport
2023cites this paper
Energy-Based Models for Anomaly Detection: A Manifold Diffusion Recovery Approach
2023cites this paper
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
2023cites this paper
SiBBlInGS: Similarity-driven Building-Block Inference using Graphs across States
2023cites this paper
Adaptive curvature exploration geometric graph neural network
2023cites this paper
Deep Stochastic Logic Gate Networks
2023cites this paper
Text Representation through Multimodal Variational Autoencoder for One-Class Learning
2023cites this paper
Spherical Centralized Quantization for Fast Image Retrieval
2023cites this paper
Contrastive Deterministic Autoencoders For Language Modeling
2023cites this paper
AIGC for Various Data Modalities: A Survey
2023cites this paper
Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning
2023cites this paper
Geometric Contrastive Learning
2023cites this paper
Latent Neural Phase Model for Synchronization Analysis
2023influential citation
Topological Obstructions and How to Avoid Them
2023cites this paper
Unsupervised meta-learning via spherical latent representations and dual VAE-GAN
2023cites this paper
S2vNTM: Semi-supervised vMF Neural Topic Modeling
2023cites this paper
Geometrically regularized autoencoders for non-Euclidean data
2023cites this paper
KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation
2023cites this paper
Probabilistic Keyphrase Generation From Copy and Generating Spaces
2023cites this paper
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling
2023cites this paper
Hyperbolic VAE via Latent Gaussian Distributions
2022cites this paper
Self-supervised learning with rotation-invariant kernels
2022cites this paper
Understanding Optimization Challenges when Encoding to Geometric Structures
2022cites this paper
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval
2022cites this paper
Flow-Based Variational Sequence Autoencoder
2022cites this paper
Improving Variational Autoencoders with Density Gap-based Regularization
2022cites this paper
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
2022cites this paper
GM-VAE: Representation Learning with VAE on Gaussian Manifold
2022cites this paper
Associating Latent Representations With Cognitive Maps via Hyperspherical Space for Neural Population Spikes
2022cites this paper
Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation
2022cites this paper
Small time asymptotics of the entropy of the heat kernel on a Riemannian manifold
2022cites this paper
Revisiting lp-constrained Softmax Loss: A Comprehensive Study
2022cites this paper
Detecting relevant app reviews for software evolution and maintenance through multimodal one-class learning
2022cites this paper
Spherical Sliced-Wasserstein
2022cites this paper
Mixture Density Hyperspherical Generative Adversarial Networks
2022cites this paper
TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval
2022cites this paper
Hyperspherical Consistency Regularization
2022cites this paper
TranSHER: Translating Knowledge Graph Embedding with Hyper-Ellipsoidal Restriction
2022cites this paper
Controlled Text Generation Using Dictionary Prior in Variational Autoencoders
2022cites this paper
Learning to Dequantise with Truncated Flows
2022cites this paper
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training
2022cites this paper
Gaussian Pixie Autoencoder: Introducing Functional Distributional Semantics to continuous latent spaces
2022cites this paper
Curvature Graph Generative Adversarial Networks
2022cites this paper
AMCAD: Adaptive Mixed-Curvature Representation based Advertisement Retrieval System
2022cites this paper
Learning Causal Representation for Face Transfer across Large Appearance Gap
2021cites this paper
Generating Multivariate Load States Using a Conditional Variational Autoencoder
2021cites this paper
Pseudo-Riemannian Graph Convolutional Networks
2021cites this paper
G-VAE, a Geometric Convolutional VAE for ProteinStructure Generation
2021cites this paper
Extensive framework based on novel convolutional and variational autoencoder based on maximization of mutual information for anomaly detection
2021cites this paper
Causal Representation Learning for Fine-Grained Face Transfer
2021cites this paper
MorphVAE: Generating Neural Morphologies from 3D-Walks using a Variational Autoencoder with Spherical Latent Space
2021cites this paper
MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining
2021cites this paper
Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach
2021cites this paper
Semi-Riemannian Graph Convolutional Networks
2021cites this paper
Multi-Hop Reasoning Question Generation and Its Application
2021cites this paper
Autoencoding Under Normalization Constraints
2021cites this paper