Selecting Categorical Features in Model-based Clustering

Cláudia M. V. Silvestre,Margarida M. G. S. Cardoso,Mário A. T. Figueiredo

Published 2009 in International Conference on Knowledge Discovery and Information Retrieval

ABSTRACT

There has been relatively little research on feature/variable selection in unsupervised clustering. In fact, feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. The methods proposed for addressing this problem are mostly focused on numerical data. In this work, we propose an approach to selecting categorical features in clustering. We assume that the data comes from a finite mixture of multinomial distributions and implement a new expectation-maximization (EM) algorithm that estimate the parameters of the model and selects the relevant variables. The results obtained on synthetic data clearly illustrate the capability of the proposed approach to select the relevant features.

PUBLICATION RECORD

  • Publication year

    2009

  • Venue

    International Conference on Knowledge Discovery and Information Retrieval

  • Publication date

    Unknown publication date

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

CITED BY