A Bayesian sparse finite mixture model for clustering data from a heterogeneous population

E. Saraiva,A. K. Suzuki,L. Milan

Published 2020 in Brazilian Journal of Probability and Statistics

ABSTRACT

Abstract. In this paper we introduce a Bayesian approach for clustering data using a sparse finite mixture model (SFMM). The SFMM is a finite mixture model with a large number of components k previously fixed where many components can be empty. In this model, the number of components k can be interpreted as the maximum number of distinct mixture components. Then, we explore the use of a prior distribution for the weights of the mixture model that take into account the possibility that the number of clusters kc (e.g. non empty components) can be random and smaller than the number of components k of the finite mixture model. In order to determine clusters we develop a MCMC algorithm denominated Split-Merge allocation sampler. In this algorithm the split-merge strategy is data-driven and was inserted within the algorithm in order to increase the mixing of the Markov chain in relation to the number of clusters. The performance of the method is verified using simulated datasets and three real datasets. The first real data set is the benchmark galaxy data, while second and third are the publicly available data set on Enzyme and Acidity, respectively.

PUBLICATION RECORD

  • Publication year

    2020

  • Venue

    Brazilian Journal of Probability and Statistics

  • Publication date

    2020-05-01

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-30 of 30 references · Page 1 of 1