Clustering Categorical Data Based on Within-Cluster Relative Mean Difference

Published 2017 in Open Journal of Statistics

ABSTRACT

The clustering on categorical variables has received intensive attention. In dataset with categorical features, some features show the superior performance on clustering procedure. In this paper, we propose a simple method to find such distinctive features by comparing pooled within-cluster mean relative difference and then partition the data upon such features and give subspace of the subgroups. The applications on zoo data and soybean data illustrate the performance of the proposed method.

PUBLICATION RECORD

Publication year
2017
Venue
Open Journal of Statistics
Publication date
2017-04-20
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.4236/OJS.2017.72013
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Categorical Data Clustering
2017cited by this paper
EnsCat: clustering of categorical data via ensembling
2016cited by this paper
A simple approach to sparse clustering
2016cited by this paper
Feature selection for clustering categorical data with an embedded modelling approach
2015cited by this paper
Categorical data clustering: What similarity measure to recommend?
2015cited by this paper
The Clustering of Categorical Data: A Comparison of a Model-based and a Distance-based Approach
2014cited by this paper
Clustering categorical data in projected spaces
2013cited by this paper
Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering
2010cited by this paper
Clustering Categorical Data Based on Distance Vectors
2006influential reference
k-ANMI: A mutual information based clustering algorithm for categorical data
2005cited by this paper
ROCK: a robust clustering algorithm for categorical attributes
1999cited by this paper
CACTUS—clustering categorical data using summaries
1999cited by this paper

CITED BY

An Efficient Technique for Disease Prediction by Using Enhanced Machine Learning Algorithms for Categorical Medical Dataset
2021cites this paper
Solving a Hard Instance of Suspicious Behaviour Detection with Sparse Binary Vectors Clustering
2019cites this paper