Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Andrea Dal Pozzolo,Reid A. Johnson,O. Caelen,Serge Waterschoot,N. Chawla,Gianluca Bontempi

Published 2014 in IEEE International Joint Conference on Neural Network

ABSTRACT

Hellinger Distance Decision Trees [10] (HDDT) has been previously used for static datasets with skewed distributions. In unbalanced data streams, state-of-the-art techniques use instance propagation and standard decision trees (e.g. C4.5 [27]) to cope with the unbalanced problem. However it is not always possible to revisit/store old instances of a stream. In this paper we show how HDDT can be successfully applied in unbalanced and evolving stream data. Using HDDT allows us to remove instance propagations between batches with several benefits: i) improved predictive accuracy ii) speed iii) single-pass through the data. We use a Hellinger weighted ensemble of HDDTs to combat concept drift and increase accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-33 of 33 references · Page 1 of 1

CITED BY

Showing 1-24 of 24 citing papers · Page 1 of 1