Online Cluster Validity Indices for Streaming Data

Masud Moshtaghi,J. Bezdek,S. Erfani,C. Leckie,J. Bailey

Published 2018 in arXiv.org

ABSTRACT

Cluster analysis is used to explore structure in unlabeled data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept has not been extended to the online setting. A key challenge is to find an efficient incremental formulation of an index that can capture both cohesion and separation of the clusters over potentially infinite data streams. In this paper, we develop two online versions (with and without forgetting factors) of the Xie-Beni and Davies-Bouldin internal validity indices, and analyze their characteristics, using two streaming clustering algorithms (sk-means and online ellipsoidal clustering), and illustrate their use in monitoring evolving clusters in streaming data. We also show that incremental cluster validity indices are capable of sending a distress signal to online monitors when evolving clusters go awry. Our numerical examples indicate that the incremental Xie-Beni index with forgetting factor is superior to the other three indices tested.

PUBLICATION RECORD

  • Publication year

    2018

  • Venue

    arXiv.org

  • Publication date

    2018-01-08

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-23 of 23 references · Page 1 of 1

CITED BY

Showing 1-12 of 12 citing papers · Page 1 of 1