General and Robust Communication-Efficient Algorithms for Distributed Clustering

Pranjal Awasthi,Maria-Florina Balcan,Colin White

Published 2017 in arXiv.org

ABSTRACT

As datasets become larger and more distributed, algorithms for distributed clustering have become more and more important. In this work, we present a general framework for designing distributed clustering algorithms that are robust to outliers. Using our framework, we give a distributed approximation algorithm for k-means, k-median, or generally any L_p objective, with z outliers and/or balance constraints, using O(m(k+z)(d+log n)) bits of communication, where m is the number of machines, n is the size of the point set, and d is the dimension. This generalizes and improves over previous work of Bateni et al. and Malkomes et al. As a special case, we achieve the first distributed algorithm for k-median with outliers, answering an open question posed by Malkomes et al. For distributed k-means clustering, we provide the first dimension-dependent communication complexity lower bound for finding the optimal clustering. This improves over the lower bound from Chen et al. which is dimension-agnostic. Furthermore, we give distributed clustering algorithms which return nearly optimal solutions, provided the data satisfies the approximation stability condition of Balcan et al. or the spectral stability condition of Kumar and Kannan.

PUBLICATION RECORD

  • Publication year

    2017

  • Venue

    arXiv.org

  • Publication date

    2017-03-02

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-36 of 36 references · Page 1 of 1