Validation plays a crucial role in the clustering process. Many different internal validity indices exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, for example, as produced by different algorithms or different algorithm hyper‐parameters. In this study, we present a comprehensive benchmark study of 26 internal validity indices, which includes highly popular classic indices as well as more recently developed ones. We adopted an enhanced revision of the methodology presented in Vendramin et al. (2010), developed here to address several shortcomings of this previous work. This overall new approach consists of three complementary custom‐tailored evaluation sub‐methodologies, each of which has been designed to assess specific aspects of an index's behavior while preventing potential biases of the other sub‐methodologies. Each sub‐methodology features two complementary measures of performance, alongside mechanisms that allow for an in‐depth investigation of more complex behaviors of the internal validity indices under study. Additionally, a new collection of 16,177 datasets has been produced, paired with eight widely used clustering algorithms, for a wider applicability scope and representation of more diverse clustering scenarios.
Benchmarking of Clustering Validity Measures Revisited
Connor Simpson,Ricardo J. G. B. Campello,Elizabeth Stojanovski
Published 2025 in Statistical analysis and data mining
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
Statistical analysis and data mining
- Publication date
2025-11-08
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-79 of 79 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1