Clustering is a fundamental problem with a wide range of applications. In this paper, we study a balanced version of the well-known k -center clustering problem, where the objective is to select k centers from a given set of points, and assign to each center at most L of the input points, so as to minimize the maximum distance from a point to the center to which it is assigned. We assume soft capacities, where points are allowed to be selected as center more than once. Motivated by applications involving massive datasets, we study the problem in the massively parallel computation (MPC) and streaming models. In particular, we present a two-round MPC algorithm for the balanced k -center problem, achieving an approximation factor of 5 (cid:11) + 2, where (cid:11) is the approximation factor of the corresponding sequential algorithm. This substantially improves the currently best approximation factor of 32 (cid:11) , available for the problem. We show that the approximation factor of our algorithm can be further improved to 5 + " in all spaces with bounded doubling dimension, including the Euclidean space. We also consider the balanced k -center problem in the streaming model, and present a constant-factor streaming algorithm in any metric space using O ( kn " ) memory, and a factor 5 + " streaming algorithm using O ( k polylog n ) memory in doubling metrics.
Massively parallel and streaming algorithms for balanced clustering
Kian Mirjalali,H. Zarrabi-Zadeh
Published 2023 in Theoretical Computer Science
ABSTRACT
PUBLICATION RECORD
- Publication year
2023
- Venue
Theoretical Computer Science
- Publication date
2023-11-01
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-26 of 26 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1