Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion

Published 2023 in Interspeech

ABSTRACT

Vector quantized variational autoencoders (VQ-VAE) are well-known deep generative models, which map input data to a latent space that is used for data generation. Such latent spaces are unstructured and can thus be difficult to interpret. Some earlier approaches have introduced a structure to the latent space through supervised learning by defining data labels as latent variables. In contrast, we propose an unsupervised technique incorporating space-filling curves into vector quantization (VQ), which yields an arranged form of latent vectors such that adjacent elements in the VQ codebook refer to similar content. We applied this technique to the latent codebook vectors of a VQ-VAE, which encode the phonetic information of a speech signal in a voice conversion task. Our experiments show there is a clear arrangement in latent vectors representing speech phones, which clarifies what phone each latent vector corresponds to and facilitates other detailed interpretations of latent vectors.

PUBLICATION RECORD

Publication year
2023
Venue
Interspeech
Publication date
2023-08-20
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.21437/interspeech.2023-1549
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

NSVQ: Noise Substitution in Vector Quantization for Machine Learning
2022cited by this paper
EXoN: EXplainable encoder Network
2021cited by this paper
Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
2021cited by this paper
Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes
2021cited by this paper
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge
2020influential reference
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural $F_0$ Model for Statistical Parametric Speech Synthesis
2020cited by this paper
Jukebox: A Generative Model for Music
2020cited by this paper
GANSpace: Discovering Interpretable GAN Controls
2020cited by this paper
Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
2020cited by this paper
High-Fidelity Synthesis with Disentangled Representation
2020cited by this paper
Generating Diverse High-Fidelity Images with VQ-VAE-2
2019cited by this paper
Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder
2019cited by this paper
Zero-Shot Voice Style Transfer with Only Autoencoder Loss
2019cited by this paper
VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019
2019cited by this paper
Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
2019cited by this paper
Supervised Vector Quantized Variational Autoencoder for Learning Interpretable Global Representations
2019cited by this paper
Hyperspectral Image Compression Using Vector Quantization, PCA and JPEG2000
2018cited by this paper
Learning Latent Subspaces in Variational Autoencoders
2018cited by this paper
Neural Discrete Representation Learning
2017cited by this paper
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
2016cited by this paper
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
2016cited by this paper
Phoneme Recognition on the TIMIT Database
2011cited by this paper
Using Broad Phonetic Group Experts for Improved Speech Recognition
2007cited by this paper
Vector quantization and signal compression
1991cited by this paper
Divergence measures based on the Shannon entropy
1991cited by this paper

CITED BY

Privacy Disclosure of Similarity Rank in Speech and Language Processing
2025influential citation
DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
2025cites this paper
Lipschitz-Driven Noise Robustness in VQ-AE for High-Frequency Texture Repair in ID-Specific Talking Heads
2024cites this paper
Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints
2024cites this paper
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
2024cites this paper
Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization
2024influential citation
Privacy PORCUPINE: Anonymization of Speaker Attributes Using Occurrence Normalization for Space-Filling Vector Quantization
2024influential citation