Characterizing and understanding ensemble-based anomaly-detection

Gustavo de P. Avelar,G. Campos,W. Meira Jr.

Published 2021 in Anais do IX Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2021)

ABSTRACT

Anomaly Detection (AD) has grown in importance in recent years, as a result of an increasing digitalization of services and data storage, and abnormal behavior detection has become a key task. However, discovering abnormal data that is mixed with the huge amount of data available is a daunting problem and the efficacy of the current methods depends on a wide range of assumptions. One effective strategy for detecting anomalies is to combine multiple models, which are called "ensembles", but the factors that determine their performance are often hard to determine, making their calibration and improvement a challenging task. In this paper we address these problems by employing a four-step method for the characterization and understanding of ensemble-based anomaly-detection task. We start by characterizing several datasets and analyzing the factors that make it hard to detect their anomalies. We then evaluate to what extent existing algorithms are able to detect anomalies in the same datasets. On the basis of both analyses, we propose a stacking-based ensemble that outperformed a state-of-the-art baseline, Isolation Forest. Finally, we examine the benefits and drawbacks of our proposal.

PUBLICATION RECORD

Publication year
2021
Venue
Anais do IX Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2021)
Publication date
2021-10-04
Fields of study
Not labeled
Identifiers
DOI 10.5753/kdmile.2021.17473
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

An Unsupervised Boosting Strategy for Outlier Detection Ensembles
2018cited by this paper
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
2016influential reference
Theoretical Foundations and Algorithms for Outlier Ensembles
2015cited by this paper
Outlier ensembles: position paper
2013cited by this paper
Interpreting and Unifying Outlier Scores
2011cited by this paper
Isolation Forest
2008influential reference
Ensemble Methods in Machine Learning
2007cited by this paper
A Survey of Outlier Detection Methodologies
2004cited by this paper
Discovering cluster-based local outliers
2003cited by this paper
Mining top-n local outliers in large databases
2001cited by this paper
Efficient algorithms for mining outliers from large data sets
2000cited by this paper
LOF: identifying density-based local outliers
2000cited by this paper
Algorithms for Mining Distance-Based Outliers in Large Datasets
1998cited by this paper
Identification of Outliers
1988influential reference
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
1982cited by this paper

CITED BY

No citing papers are available for this paper.