A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data

Robert K. L. Kennedy,Zahra Salekshahrezaee,T. Khoshgoftaar

Published 2022 in International Conference on Cognitive Machine Intelligence

ABSTRACT

Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-41 of 41 references · Page 1 of 1