A Bayesian sparse finite mixture model for clustering data from a heterogeneous population

Published 2020 in Brazilian Journal of Probability and Statistics

ABSTRACT

Abstract. In this paper we introduce a Bayesian approach for clustering data using a sparse finite mixture model (SFMM). The SFMM is a finite mixture model with a large number of components k previously fixed where many components can be empty. In this model, the number of components k can be interpreted as the maximum number of distinct mixture components. Then, we explore the use of a prior distribution for the weights of the mixture model that take into account the possibility that the number of clusters kc (e.g. non empty components) can be random and smaller than the number of components k of the finite mixture model. In order to determine clusters we develop a MCMC algorithm denominated Split-Merge allocation sampler. In this algorithm the split-merge strategy is data-driven and was inserted within the algorithm in order to increase the mixing of the Markov chain in relation to the number of clusters. The performance of the method is verified using simulated datasets and three real datasets. The first real data set is the benchmark galaxy data, while second and third are the publicly available data set on Enzyme and Acidity, respectively.

PUBLICATION RECORD

Publication year
2020
Venue
Brazilian Journal of Probability and Statistics
Publication date
2020-05-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1214/18-bjps425
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering
2018influential reference
Partitioning gene expression data by data-driven Markov chain Monte Carlo
2015cited by this paper
Mixture models with an unknown number of components via a new posterior split-merge MCMC algorithm
2014cited by this paper
Model-based clustering based on sparse finite Gaussian mixtures
2014cited by this paper
Model-based clustering of high-dimensional data: A review
2014cited by this paper
A Framework for Feature Selection in Clustering
2010cited by this paper
Bayesian finite mixtures with an unknown number of components: The allocation sampler
2007cited by this paper
Model-Based Clustering With Dissimilarities: A Bayesian Approach
2007cited by this paper
Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling
2005cited by this paper
Mixture models, latent variables and partitioned importance sampling
2004influential reference
Bayesian measures of model complexity and fit
2002cited by this paper
Model-Based Clustering, Discriminant Analysis, and Density Estimation
2002cited by this paper
Finite Mixture Models
2000cited by this paper
Dealing with label switching in mixture models
2000influential reference
Computational and Inferential Difficulties with Mixture Posterior Distributions
2000cited by this paper
Practical Bayesian Density Estimation Using Mixtures of Normals
1997influential reference
Inference in model-based cluster analysis
1997cited by this paper
Bayesian Density Estimation and Inference Using Mixtures
1995influential reference
Understanding the Metropolis-Hastings Algorithm
1995cited by this paper
Summary of the
1994influential reference
Model-based Gaussian and non-Gaussian clustering
1993cited by this paper
Mixture Models: Inference and Applications to Clustering
1989cited by this paper
Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions
1987cited by this paper
Estimating the Dimension of a Model
1978cited by this paper
Bayesian cluster analysis
1978cited by this paper
A new look at the statistical model identification
1974influential reference
Some methods for classification and analysis of multivariate observations
1967cited by this paper
Hierarchical Grouping to Optimize an Objective Function
1963cited by this paper
A statistical method for evaluating systematic relationships
1958cited by this paper
The application of computers to taxonomy.
1957cited by this paper

CITED BY

A Bayesian semiparametric mixture model for clustering zero-inflated microbiome data.
2025cites this paper
Bayesian Repulsive Mixture Modeling with Mat\'ern Point Processes
2022cites this paper
An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
2019cites this paper