Consensus spectral clustering in near-linear time

Dijun Luo,C. Ding,Heng Huang,F. Nie

Published 2011 in IEEE International Conference on Data Engineering

ABSTRACT

This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n2) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.

PUBLICATION RECORD

  • Publication year

    2011

  • Venue

    IEEE International Conference on Data Engineering

  • Publication date

    2011-04-01

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-29 of 29 references · Page 1 of 1

CITED BY

Showing 1-22 of 22 citing papers · Page 1 of 1