Computational Feasibility of Clustering under Clusterability Assumptions

Published 2015 in arXiv.org

ABSTRACT

It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such 'clusterable' instances. In other words, hope that "Clustering is difficult only when it does not matter" (CDNM thesis, for short). We believe that to some extent this may indeed be the case. This paper provides a survey of recent papers along this line of research and a critical evaluation their results. Our bottom line conclusion is that that CDNM thesis is still far from being formally substantiated. We start by discussing which requirements should be met in order to provide formal support the validity of the CDNM thesis. In particular, we list some implied requirements for notions of clusterability. We then examine existing results in view of those requirements and outline some research challenges and open questions.

PUBLICATION RECORD

Publication year
2015
Venue
arXiv.org
Publication date
2015-01-02
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1501.00437
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Relax, No Need to Round: Integrality of Clustering Formulations
2014cited by this paper
Clustering under approximation stability
2013influential reference
On the practically interesting instances of MAXCUT
2012cited by this paper
Clustering is difficult only when it does not matter
2012cited by this paper
Clustering under Perturbation Resilience
2011cited by this paper
Data stability in clustering: A closer look
2011influential reference
Characterization of Linkage-based Clustering
2010cited by this paper
Stability Yields a PTAS for k-Median and k-Means Clustering
2010cited by this paper
Center-based clustering under perturbation stability
2010cited by this paper
Stability and model selection in k-means clustering
2010cited by this paper
Clusterability: A Theoretical Study
2009cited by this paper
Are Stable Instances Easy?
2009influential reference
Approximate clustering without the approximation
2009influential reference
Stability of k -Means Clustering
2007cited by this paper
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
2007cited by this paper
Alternative Measures of Computational Complexity with Applications to Agnostic Learning
2006cited by this paper
The Effectiveness of Lloyd-Type Methods for the k-Means Problem
2006influential reference
Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time
2001cited by this paper
Efficient Learning of Linear Perceptrons
2000cited by this paper
Parameterized Complexity
1998cited by this paper
On the theory of average case complexity
1989cited by this paper
Average Case Complete Problems
1986cited by this paper

CITED BY

Wide Gaps and Kleinberg’s Clustering Axioms for k–Means
2024cites this paper
Wide Gaps and Clustering Axioms
2023cites this paper
High-Dimensional Wide Gap $k$-Means Versus Clustering Axioms
2022cites this paper
AN ADAPTIVE 𝑘-MEDIANS CLUSTERING ALGORITHM
2022cites this paper
On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness
2020cites this paper
Data ultrametricity and clusterability
2019cites this paper
Resource Allocation and Subset Selection: New Approaches at the Interface between discrete and continuous Optimization
2019cites this paper
Analysis of Ward's Method
2019cites this paper
To Cluster, or Not to Cluster: An Analysis of Clusterability Methods
2018influential citation
A PAC-Theory of Clustering with Advice
2018cites this paper
Clustering Redemption-Beyond the Impossibility of Kleinberg's Axioms
2018cites this paper
How to tell when a clustering is (approximately) correct using convex relaxations
2018cites this paper
Learning by Unsupervised Nonlinear Diffusion
2018cites this paper
Clustering Perturbation Resilient Instances
2018cites this paper
On Euclidean $k$-Means Clustering with $\alpha$-Center Proximity
2018cites this paper
Semi-Supervised Clustering of stable instances
2018influential citation
Machine learning friendly set version of Johnson–Lindenstrauss lemma
2017cites this paper
Clustering Stable Instances of Euclidean k-means
2017cites this paper
Algorithms and Complexity Results for Learning and Big Data
2017cites this paper
CS 264 : Beyond Worst-Case Analysis Lecture # 6 : Perturbation-Stable Clustering ∗
2017cites this paper
Stability and Recovery for Independence Systems
2017cites this paper
Attainable Best Guarantee for the Accuracy of k-medians Clustering in [ 0 , 1 ]
2017cites this paper
An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering
2017cites this paper
To Cluster, or Not to Cluster: How to Answer the Question
2017influential citation
Topics in Graph Clustering
2017cites this paper
Finding Meaningful Cluster Structure Amidst Background Noise
2016cites this paper
Foundations of Perturbation Robust Clustering
2016influential citation
On Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice
2016cites this paper
When is Clustering Perturbation Robust?
2016influential citation
An Effective and Efficient Approach for Clusterability Evaluation
2016influential citation
Clustering with Same-Cluster Queries
2016cites this paper
Graph Clustering: Block-models and model free results
2016cites this paper
Interactive Clustering of Linear Classes and Cryptographic Lower Bounds
2015cites this paper
Towards an axiomatic approach to hierarchical clustering of measures
2015cites this paper
Clustering Faulty Data : A Formal Analysis of Perturbation Robustness
2015cites this paper
Clustering is Easy When ....What?
2015cites this paper
CS 264 : Beyond Worst-Case Analysis Lecture # 7 : Perturbation Stability and Single-Link + + ∗
2014cites this paper