Human Motion Generation via Conditioned GMVAE with TUNet

Published 2024 in IEEE International Conference on Acoustics, Speech, and Signal Processing

ABSTRACT

In recent years, Variational Autoencoders (VAEs) have been proposed for motion synthesis to model action-label-conditioned human motion. However, these approaches only use Gaussian distribution as a prior assumption, this hard constraint might be too restrictive for the latent space and hurt the performance of the model. To address the issues, we model the latent space as a Gaussian mixture distribution and derive a new evidence lower bound (ELBO). Furthermore, to enhance the expressiveness of the model, we introduce Fisher discriminant as a regularization. We develop the attention mechanism and enable the Transformer-based U-Net to generate motions that correspond to semantic information only using action labels. The proposed CGMVAE-TU model has been evaluated on various datasets, and it surpasses the SOTA on almost all metrics. The generated human motions are realistic and natural.

PUBLICATION RECORD

Publication year
2024
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Publication date
2024-04-14
Fields of study
Computer Science
Identifiers
DOI 10.1109/ICASSP48485.2024.10446285
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

SMPL: A Skinned Multi-Person Linear Model
2023cited by this paper
TEMOS: Generating diverse human motions from textual descriptions
2022cited by this paper
Learning Uncoupled-Modulation CVAE for 3D Action-Conditioned Human Motion Synthesis
2022influential reference
Human Motion Diffusion Model
2022cited by this paper
Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction
2022cited by this paper
Image Generation Based on Texture Guided VAE-AGAN for Regions of Interest Detection in Remote Sensing Images
2021cited by this paper
Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder
2021cited by this paper
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning
2021cited by this paper
Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
2021influential reference
Perpetual Motion: Generating Unbounded Human Motion
2020cited by this paper
AUTO-ENCODING VARIATIONAL BAYES
2020cited by this paper
Action2Motion: Conditioned Generation of 3D Human Motions
2020influential reference
GENERATIVE ADVERSARIAL NETS
2018cited by this paper
Generating Animated Videos of Human Activities from Natural Language Descriptions
2018cited by this paper
Deep Video Generation, Prediction and Completion of Human Action Sequences
2017cited by this paper
MoCoGAN: Decomposing Motion and Content for Video Generation
2017cited by this paper
Pixel Recurrent Neural Networks
2016cited by this paper
Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders
2016cited by this paper
Dancing to the music
2000cited by this paper

CITED BY

MAFD: Fine-Grained Motion Style Transfer with Adaptive Signal Fusion
2025cites this paper
PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks
2025cites this paper