There has been relatively little research on feature/variable selection in unsupervised clustering. In fact, feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. The methods proposed for addressing this problem are mostly focused on numerical data. In this work, we propose an approach to selecting categorical features in clustering. We assume that the data comes from a finite mixture of multinomial distributions and implement a new expectation-maximization (EM) algorithm that estimate the parameters of the model and selects the relevant variables. The results obtained on synthetic data clearly illustrate the capability of the proposed approach to select the relevant features.
Selecting Categorical Features in Model-based Clustering
Cláudia M. V. Silvestre,Margarida M. G. S. Cardoso,Mário A. T. Figueiredo
Published 2009 in International Conference on Knowledge Discovery and Information Retrieval
ABSTRACT
PUBLICATION RECORD
- Publication year
2009
- Venue
International Conference on Knowledge Discovery and Information Retrieval
- Publication date
Unknown publication date
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-7 of 7 references · Page 1 of 1
CITED BY
Showing 1-2 of 2 citing papers · Page 1 of 1