A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data

Robert K. L. Kennedy,Zahra Salekshahrezaee,T. Khoshgoftaar

Published 2022 in International Conference on Cognitive Machine Intelligence

ABSTRACT

Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.

PUBLICATION RECORD

Publication year
2022
Venue
International Conference on Cognitive Machine Intelligence
Publication date
2022-12-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/CogMI56440.2022.00018
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Class-Imbalanced Study with Feature Extraction via PCA and Convolutional Autoencoder
2022cited by this paper
Feature Extraction for Class Imbalance Using a Convolutional Autoencoder and Data Sampling
2021cited by this paper
Latent-Insensitive Autoencoders for Anomaly Detection and Class-Incremental Learning
2021cited by this paper
Unsupervised anomaly detection with LSTM autoencoders using statistical data-filtering
2021cited by this paper
The Effects of Class Label Noise on Highly-Imbalanced Big Data
2021cited by this paper
Unsupervised outlier detection in multidimensional data
2021cited by this paper
A hybrid unsupervised clustering-based anomaly detection method
2021cited by this paper
Developments in Unsupervised Outlier Detection Research
2020cited by this paper
Machine Learning Techniques for Network Anomaly Detection: A Survey
2020cited by this paper
Anomaly Detection using Unsupervised Methods: Credit Card Fraud Case Study
2019cited by this paper
Learning From Imbalanced Data
2019cited by this paper
Examining characteristics of predictive models with imbalanced big data
2019cited by this paper
Alpha Discovery Neural Network based on Prior Knowledge
2019cited by this paper
Combining unsupervised and supervised learning in credit card fraud detection
2019cited by this paper
Identifying Medicare Provider Fraud with Unsupervised Machine Learning
2018cited by this paper
Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise
2018cited by this paper
Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection
2018cited by this paper
Big Data fraud detection using multiple medicare data sources
2018cited by this paper
A survey on addressing high-class imbalance in big data
2018cited by this paper
Building and Interpreting Risk Models from Imbalanced Clinical Data
2018cited by this paper
Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data
2018cited by this paper
Adaptive Threshold for Outlier Detection on Data Streams
2018cited by this paper
Learning from imbalanced data: open challenges and future directions
2016influential reference
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
2016cited by this paper
Enhancing Ensemble Learners with Data Sampling on High-Dimensional Imbalanced Tweet Sentiment Data
2016cited by this paper
Credit Card Fraud Detection
2016cited by this paper
A decomposition of the outlier detection problem into a set of supervised learning problems
2015cited by this paper
Calibrating Probability with Undersampling for Unbalanced Classification
2015cited by this paper
Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction
2014cited by this paper
Classification in the Presence of Label Noise: A Survey
2014cited by this paper
Effective detection of sophisticated online banking fraud on extremely imbalanced data
2012cited by this paper
Credit Card Fraud Detection Using Hidden Markov Model
2012cited by this paper
A Study on the Relationships of Classifier Performance Metrics
2009cited by this paper
Exploratory Undersampling for Class-Imbalance Learning
2009cited by this paper
EasyEnsemble and Feature Selection for Imbalance Data Sets
2009cited by this paper
Anomaly detection: A survey
2009cited by this paper
Isolation Forest
2008cited by this paper
Mining Data with Rare Events: A Case Study
2007cited by this paper
Learning with limited minority class data
2007cited by this paper
Data mining for improved cardiac care
2006cited by this paper
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
1998cited by this paper

CITED BY

The Impact of Class Imbalance on Unsupervised Deep Anomaly Detection for Cognitive Data
2025cites this paper
Unsupervised feature selection and class labeling for credit card fraud
2025cites this paper
Unsupervised Cognitive Impairment Detection Using Convolutional Autoencoders and Isolation Forest
2025cites this paper
Anomaly Detection in Key-Management Activities Using Metadata: A Case Study and Framework
2024cites this paper
Anomaly Detection in the Key-Management Interoperability Protocol Using Metadata
2024cites this paper
Detecting Frauds and Payment Defaults on Credit Card Data Inherited With Imbalanced Class Distribution and Overlapping Class Problems: A Systematic Review
2024cites this paper
Synthesizing class labels for highly imbalanced credit card fraud detection data
2024cites this paper
Comparative analysis of binary and one-class classification techniques for credit card fraud data
2023cites this paper
A Novel Approach to Synthesize Class Labels in Highly Imbalanced Large Data
2023cites this paper
Unsupervised Anomaly Detection of Class Imbalanced Cognition Data Using an Iterative Cleaning Method
2023cites this paper