A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Published 2012 in Expert systems with applications

ABSTRACT

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.

PUBLICATION RECORD

Publication year
2012
Venue
Expert systems with applications
Publication date
2012-09-10
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1016/j.eswa.2012.07.021 arXiv 1209.1960
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

On comparing partitions
2015cited by this paper
海外情報 Georgia Institute of Technologyでの研究
2013cited by this paper
Careful Seeding Method based on Independent Components Analysis for k-means Clustering
2012cited by this paper
Improving the performance of k-means for color quantization
2011cited by this paper
A User’s Guide
2011cited by this paper
Parallel Spectral Clustering in Distributed Systems
2011cited by this paper
IBM SPSS Statistics 19 Statistical Procedures Companion
2011cited by this paper
Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms
2010cited by this paper
Improved step size adaptation for the MO-CMA-ES
2010cited by this paper
Making k-means Even Faster
2010cited by this paper
Relative clustering validity criteria: A comparative overview
2010cited by this paper
Bandwidth Adaptive Hardware Architecture of K-Means Clustering for Video Analysis
2010cited by this paper
An improved column generation algorithm for minimum sum-of-squares clustering
2009cited by this paper
The Planar k-means Problem is NP-hard I
2009cited by this paper
Adapting the right measures for K-means clustering
2009cited by this paper
External validation measures for K-means clustering: A data distribution perspective
2009cited by this paper
Robust partitional clustering by outlier and density insensitive seeding
2009cited by this paper
A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability
2009cited by this paper
An initialization method for the K-Means algorithm using neighborhood model
2009cited by this paper
NP-hardness of Euclidean sum-of-squares clustering
2008cited by this paper
Data clustering: 50 years beyond K-means
2008cited by this paper
An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons
2008cited by this paper
Hierarchical initialization approach for K-Means clustering
2008cited by this paper
In search of deterministic methods for initializing K-means and Gaussian mixture clustering
2007influential reference
k-means++: the advantages of careful seeding
2007cited by this paper
A method for initialising the K-means clustering algorithm using kd-trees
2007cited by this paper
A New Algorithm for Cluster Initialization
2007cited by this paper
A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests
2007cited by this paper
Comparing clusterings---an information based distance
2007cited by this paper
Statistical Comparisons of Classifiers over Multiple Data Sets
2006cited by this paper
Efficient disk-based K-means clustering for relational databases
2004cited by this paper
Initialization of cluster refinement algorithms: a review and comparative study
2004cited by this paper
The global k-means clustering algorithm
2003cited by this paper
Robust clustering by pruning outliers
2003cited by this paper
A computational study of several relocation methods for k-means algorithms
2003cited by this paper
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
2002cited by this paper
'1+1>2': merging distance and density based clustering
2001cited by this paper
Performance criteria for graph clustering and Markov cluster experiments
2000cited by this paper
LOF: identifying density-based local outliers
2000cited by this paper
An empirical comparison of four initialization methods for the K-Means algorithm
1999cited by this paper
Data clustering: a review
1999influential reference
Fast and robust fixed-point algorithms for independent component analysis
1999influential reference
A Divise Initialisation Method for Clustering Algorithms
1999cited by this paper
Refining Initial Points for K-Means Clustering
1998influential reference
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
1998cited by this paper
Advances in neural information processing systems 7
1997cited by this paper
New methods for the initialisation of clusters
1996cited by this paper
A new initialization technique for generalized Lloyd iteration
1994cited by this paper
Convergence Properties of the K-Means Algorithms
1994cited by this paper
Simulated annealing for selecting optimal initial seeds in the K-means algorithm
1994cited by this paper
A self-organizing network for hyperellipsoidal clustering (HEC)
1994cited by this paper
A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm
1993cited by this paper
Multiple Hypotheses Testing
1993influential reference
A comparison of several vector quantization codebook generation approaches
1993cited by this paper
FAST ENCODING ALGORITHM FOR VQ-BASED IMAGE-CODING
1990cited by this paper
Finding Groups in Data: An Introduction to Cluster Analysis
1990cited by this paper
A study of standardization of variables in cluster analysis
1988cited by this paper
Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses
1988cited by this paper
An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization
1985cited by this paper
Clustering to Minimize the Maximum Intercluster Distance
1985cited by this paper
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
1984cited by this paper
Least squares quantization in PCM
1982cited by this paper
An Algorithm for Vector Quantizer Design
1980cited by this paper
Approximations of the critical region of the fbietkan statistic
1980cited by this paper
An examination of the effect of six types of error perturbation on fifteen clustering algorithms
1980cited by this paper
Applied Nonparametric Statistics
1979cited by this paper
A k-means clustering algorithm
1979cited by this paper
Fuzzy sets and decisionmaking approaches in vowel and speaker recognition
1977cited by this paper
Computational experiences with the exchange method
1977cited by this paper
Pattern Recognition Principles
1974cited by this paper
Cluster Analysis for Applications
1973influential reference
SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD
1970cited by this paper
A clustering technique for summarizing multivariate data.
1967cited by this paper
Some methods for classification and analysis of multivariate observations
1967cited by this paper
A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems
1967cited by this paper
Some methods for classi cation and analysis of multivariate observations
1967cited by this paper
A general theory of classificatory sorting strategies: II. Clustering systems
1967cited by this paper
Multidimensional group analysis
1966cited by this paper
A general theory of classificatory sorting strategies: ii
1966cited by this paper
Cluster analysis of multivariate data : efficiency versus interpretability of classifications
1965cited by this paper
The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance
1937cited by this paper
Simplified calculation of principal components
1936cited by this paper
Expert Systems With Applications
year unknowncited by this paper
AN EFFICIENT
year unknowncited by this paper

CITED BY

An Unsupervised Learning Approach to Optimising Tumour Therapy through Clinical Data Mining
2026cites this paper
Privacy-enhanced case-based reasoning prediction method with adaptive noise allocation
2026cites this paper
Profiling Teachers' Technology Acceptance and Digital Competence Using Machine Learning Techniques
2026cites this paper
Pengelompokan Mahasiswa Berdasarkan Nilai dan Kehadiran Menggunakan K-Means
2026cites this paper
A K-Means Clustering Approach for Accelerated Path Planning in GMA-DED: The Fast Advanced-Pixel Strategy
2026cites this paper
Hybrid IDK_means++: Integrating Particle Swarm Optimization for Robust and Accurate K_means Initialization
2026cites this paper
DeepWK-MSTC: a novel approach for adaptive controller placement in software-defined networks via deep learning
2026cites this paper
Comparative Analysis of Image Binarization Algorithms for UAV-Based Soybean Canopy Extraction Across Growth Stages for Image Labelling
2026cites this paper
K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence
2026cites this paper
CLEAR-HPV: Interpretable Concept Discovery for HPV-Associated Morphology in Whole-Slide Histology
2026cites this paper
Modeling the Nutrition–Academic Intention Gap: A Data-Driven Adaptive Gamified Architecture
2026cites this paper
Grey wolf optimization for color quantization
2026cites this paper
Extending battery lifecycles: A holistic review of second-life lithium-ion technology in sustainable energy systems from assessment to emerging trends
2026cites this paper
Initializing K-means clustering algorithm based on frequent patterns
2026cites this paper
Sustainable virtual machine placement in heterogeneous cloud data centers: a reinforcement learning-based approach
2026cites this paper
KHOI-SMOTE: An efficient oversampling technique based on k-means clustering and h-outlyingness index for imbalanced medical data
2026cites this paper
Lightweight Semantic Segmentation for Fermentation Foam Monitoring: A Comparative Study of U-Net, DeepLabV3+, Fast-SCNN, and SegNet
2026cites this paper
Coordinated Reactive Power Optimal Control Considering the Voltage Security of Wind Farm
2025cites this paper
A physics-informed clustering approach for ultrasonics-based nondestructive evaluation
2025cites this paper
Investigation of Machining Condition for Barrel End Mill Based on Data-Mining Method for Tool Catalog Database
2025cites this paper
An Agroecological Zoning Approach for Sustainable Agriculture in Burkina Faso, West Africa
2025cites this paper
AM-Net: A Network With Attention and Multiscale Feature Fusion for Skin Lesion Segmentation
2025cites this paper
An Energy-Domain IR NUC Method Based on Unsupervised Learning
2025cites this paper
Investigating carsharing’s potential markets and group characteristics using attitude-based market segmentation approach
2025cites this paper
Deep learning powered single-cell clustering framework with enhanced accuracy and stability
2025cites this paper
Analysis of deep non-smooth symmetric nonnegative matrix factorization on hierarchical clustering
2025cites this paper
Techniques of image segmentation: a review
2025cites this paper
Industrial big data analysis strategy based on automatic data classification and interpretable knowledge graph
2025cites this paper
Method for Estimating Traffic Flow of Multivehicle Types Based on Gantry Data
2025cites this paper
Resampling approaches to handle class imbalance: a review from a data perspective
2025cites this paper
Individual differences in temporal order judgment
2025cites this paper
Prediction of submarine soil dredging difficulty scale in cutter suction dredger construction with clustering-based deep learning
2025cites this paper
Enhancing Machine Learning-Based GPP Upscaling Error Correction: An Equidistant Sampling Method with Optimized Step Size and Intervals
2025cites this paper
Exploring the influence of geographic proximity on the urban thermal environment and its boundary effects
2025cites this paper
Psychiatric Inpatient Length of Stay and Needs: A Cluster Analysis using Machine Learning Algorithms.
2025cites this paper
Spatial-Temporal Differences and Influencing Factors of Cultural Industry Resilience: A Study Based on China
2025cites this paper
Top Global Concrete-Producing Countries: A Hierarchical Cluster Analysis of Concrete Production, CO2 Emissions, and Economic Growth
2025cites this paper
Enhanced Unsupervised Discriminant Dimensionality Reduction for Nonlinear Data
2025cites this paper
HARLI CQUINN: Higher Adjusted Randomness with Linear In Complexity QUantum INspired Networks for K-Means
2025cites this paper
Comprehensive dataset of global innovation index panel data (2013–2022): Clustering with K-means and principal component analysis
2025cites this paper
Python Implementation of Pore Morphology Method Enhanced with Convolutional Neural Network for Cost-Effective Simulation of Fluid Saturation in Porous Media
2025cites this paper
Reduced-Order Concurrent Multiscale Modelling of Composite Structures
2025cites this paper
A novel k-means clustering approach using two distance measures for Gaussian data
2025influential citation
Refining the understanding of spatial heterogeneity in closed-circuit television camera effectiveness: a new evaluation strategy
2025cites this paper
Spatiotemporal prediction for groundwater heavy metal contamination using Soft-DTW-based clustering and graph neural network framework.
2025cites this paper
Modeling high dimensional point clouds with the spherical cluster model
2025cites this paper
A Surrogate-Enhanced Framework for Flexible and Optimal Operational Space Identification under Uncertainty
2025cites this paper
Towards faster seeding for k-means++ via lower bound and triangle inequality
2025cites this paper
kProtoClust: Towards Adaptive k-Prototype Clustering without Known k
2025influential citation
The impact of artificial intelligence adoption degree on corporate digital technology innovation
2025cites this paper
Clusters of difficulties experienced by children during their transition to primary school
2025cites this paper
Autonomous cycle of data analysis tasks for the determination of the coffee productive process for MSMEs
2025cites this paper
Recognition of Analogous Oil Droplet Attached to Transparent Pipe Wall
2025cites this paper
An unsupervised learning method based on U-Net + + for low-light image enhancement
2025cites this paper
CDBC: Continuous Distribution-based Clustering for High-Dimensional Data Streams
2025cites this paper
A Breadth First Search Algorithm for Data Clustering based on Space-time Curvature
2025cites this paper
Machine Learning–Assisted Raman and Ultraviolet–Visible Spectroscopic Analysis of Mung Plants Exposed to Zinc Oxide Nanoparticles
2025cites this paper
Stochastic limited memory bundle algorithm for clustering in big data
2025cites this paper
Improved seeding strategies for k-means and k-GMM
2025influential citation
Advancing Image Compression Through Clustering Techniques: A Comprehensive Analysis
2025cites this paper
Quasi-4-dimensional ionospheric delay model based on clustering algorithms
2025cites this paper
Hybrid Optimization Method for Social Internet of Things Service Provision Based on Community Detection
2025cites this paper
Intelligent Identification of Hidden Defects in Asphalt Roads Using GPR Based on Loss Function and Anchor Box Optimization
2025cites this paper
Development and validation of an offline multiscale topology optimization framework using interpolated constraint functions
2025cites this paper
Feature blend-based multiscale semantic segmentation
2025cites this paper
A Novel Framework for Identifying Hot Spots in Coal Research
2025cites this paper
Product Quantization for Surface Soil Similarity
2025cites this paper
The practices and politics of machine learning: a field guide for analyzing artificial intelligence
2025cites this paper
Relationship Between Landscape Character and Public Preferences in Urban Landscapes: A Case Study from the East–West Mountain Region in Wuhan, China
2025cites this paper
Optimized multimodal anomaly detection in fused deposition modeling: real-time monitoring with clustering classifiers and data fusion
2025cites this paper
Efficient Software Development Effort Estimation Approaches for Improving Scalability in the Training Phase
2025cites this paper
Cross-Layer Discrete Concept Discovery for Interpreting Language Models
2025cites this paper
Multi-criteria selection of data clustering methods for e-commerce personalization
2025cites this paper
Leveraging Machine Learning for Accurate and Fast Stellar Mass Estimation of Galaxies
2025cites this paper
Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions
2025cites this paper
Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis
2025cites this paper
BioCompNet: A Deep Learning Workflow Enabling Automated Body Composition Analysis toward Precision Management of Cardiometabolic Disorders
2025cites this paper
Mapping Occupational Stress and Burnout in the Probation System: A Quantitative Approach
2025cites this paper
Artificial Intelligence Adoption in the European Union: A Data-Driven Cluster Analysis (2021–2024)
2025cites this paper
Solving Freshness in RAG: A Simple Recency Prior and the Limits of Heuristic Trend Detection
2025cites this paper
Combining electromagnetic induction and satellite-based NDVI data for improved determination of management zones for sustainable crop production
2025cites this paper
Data Mining Scheme for Globally Distributed Big Data
2025cites this paper
Machine Learning in Slope Stability: A Review with Implications for Landslide Hazard Assessment
2025cites this paper
PANDEMİ SONRASI BORSA İSTANBUL SAĞLIK ŞİRKETLERİNİN KÜMELEME ANALİZİ İLE FİNANSAL DEĞERLENDİRİLMESİ
2025cites this paper
LS-BMO-HDBSCAN as a hybrid memetic bacterial intelligence framework for efficient data clustering
2025cites this paper
Cluster-wise deep learning framework for scenario-adaptive energy consumption and productivity prediction of cutter suction dredger
2025cites this paper
Un enfoque unificado para el estudio de patrones relacionales espacio-temporales
2025cites this paper
Beyond performance: A POMDP-based machine learning framework for expert cognition
2025cites this paper
THE LINK BETWEEN FOREIGN DIRECT INVESTMENTS AND ECONOMIC GROWTH
2025cites this paper
An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis
2025cites this paper
Efficient error minimization in kernel k-means clustering
2025cites this paper
Machine learning-enabled optimization of biochar resource utilization and carbon mitigation pathways: mechanisms and challenges
2025cites this paper
Analyzing lump coal gas release as a source of coal mine methane: Using GC and machine learning
2025cites this paper
Hard and soft clustering methods employed in groundwater research: A systematic review
2025cites this paper
A k-Means Algorithm with Automatic Outlier Detection
2025cites this paper
Multi-robot exploration for the CADRE mission
2025cites this paper
CAS-SFCM: Content-Aware Image Smoothing Based on Fuzzy Clustering with Spatial Information
2025cites this paper
Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem
2025cites this paper
Robust Barycenters of Persistence Diagrams
2025cites this paper
AI-DRIVEN ENERGY LOAD FORECASTING AND RENEWABLE INTEGRATION: OPTIMIZING SUSTAINABILITY FOR SMART CAMPUSES
2024cites this paper