Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.
A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data
Robert K. L. Kennedy,Zahra Salekshahrezaee,T. Khoshgoftaar
Published 2022 in International Conference on Cognitive Machine Intelligence
ABSTRACT
PUBLICATION RECORD
- Publication year
2022
- Venue
International Conference on Cognitive Machine Intelligence
- Publication date
2022-12-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-41 of 41 references · Page 1 of 1
CITED BY
Showing 1-10 of 10 citing papers · Page 1 of 1