Hellinger Distance Decision Trees [10] (HDDT) has been previously used for static datasets with skewed distributions. In unbalanced data streams, state-of-the-art techniques use instance propagation and standard decision trees (e.g. C4.5 [27]) to cope with the unbalanced problem. However it is not always possible to revisit/store old instances of a stream. In this paper we show how HDDT can be successfully applied in unbalanced and evolving stream data. Using HDDT allows us to remove instance propagations between batches with several benefits: i) improved predictive accuracy ii) speed iii) single-pass through the data. We use a Hellinger weighted ensemble of HDDTs to combat concept drift and increase accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift.
Using HDDT to avoid instances propagation in unbalanced and evolving data streams
Andrea Dal Pozzolo,Reid A. Johnson,O. Caelen,Serge Waterschoot,N. Chawla,Gianluca Bontempi
Published 2014 in IEEE International Joint Conference on Neural Network
ABSTRACT
PUBLICATION RECORD
- Publication year
2014
- Venue
IEEE International Joint Conference on Neural Network
- Publication date
2014-07-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-33 of 33 references · Page 1 of 1
CITED BY
Showing 1-24 of 24 citing papers · Page 1 of 1