Evaluation of Categorical Generative Models - Bridging the Gap Between Real and Synthetic Data

Published 2022 in IEEE International Conference on Acoustics, Speech, and Signal Processing

ABSTRACT

The machine learning community has mainly relied on real data to benchmark algorithms as it provides compelling evidence of model applicability. Evaluation on synthetic datasets can be a powerful tool to provide a better understanding of a model’s strengths, weaknesses and overall capabilities. Gaining these insights can be particularly important for generative modeling as the target quantity is completely unknown. Multiple issues related to the evaluation of generative models have been reported in the literature. We argue those problems can be avoided by an evaluation based on ground truth. General criticisms of synthetic experiments are that they are too simplified and not representative of practical scenarios. As such, our experimental setting is tailored to a realistic generative task. We focus on categorical data and introduce an appropriately scalable evaluation method. Our method involves tasking a generative model to learn a distribution in a high-dimensional setting. We then successively bin the large space to obtain smaller probability spaces where meaningful statistical tests can be applied. We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks, and compare the generative models based on the highest task difficulty they can reach before being detected as being too far from the ground truth. We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.

PUBLICATION RECORD

Publication year
2022
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Publication date
2022-10-28
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1109/ICASSP49357.2023.10097150 arXiv 2210.16405
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

On Evaluation Metrics for Graph Generative Models
2022cited by this paper
Argmax Flows: Learning Categorical Distributions with Normalizing Flows
2021cited by this paper
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
2021cited by this paper
Efficient generative modeling of protein sequences using simple autoregressive models
2021cited by this paper
Categorical Normalizing Flows via Continuous Transformations
2020cited by this paper
Evaluation of Text Generation: A Survey
2020cited by this paper
Understanding the Failure Modes of Out-of-Distribution Generalization
2020cited by this paper
A Survey on Distribution Testing: Your Data is Big. But is it Blue?
2020cited by this paper
Optimal testing of discrete distributions with high probability
2020cited by this paper
Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation
2019cited by this paper
HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
2019cited by this paper
Pros and Cons of GAN Evaluation Measures
2018cited by this paper
Language GANs Falling Short
2018cited by this paper
Do Deep Generative Models Know What They Don't Know?
2018cited by this paper
On the Quantitative Analysis of Decoder-Based Generative Models
2016cited by this paper
A note on the evaluation of generative models
2015cited by this paper
Factoring Variations in Natural Images with Deep Gaussian Mixture Models
2014cited by this paper
Testing Closeness of Discrete Distributions
2010cited by this paper
Testing random variables for independence and identity
2001cited by this paper

CITED BY

Categorical Generative Model Evaluation via Synthetic Distribution Coarsening
2024cites this paper