Data-Centric AI for Healthcare Fraud Detection

Published 2023 in SN Computer Science

ABSTRACT

Automated methods for detecting fraudulent healthcare providers have the potential to save billions of dollars in healthcare costs and improve the overall quality of patient care. This study presents a data-centric approach to improve healthcare fraud classification performance and reliability using Medicare claims data. Publicly available data from the Centers for Medicare & Medicaid Services (CMS) are used to construct nine large-scale labeled data sets for supervised learning. First, we leverage CMS data to curate the 2013–2019 Part B, Part D, and Durable Medical Equipment, Prosthetics, Orthotics, and Supplies (DMEPOS) Medicare fraud classification data sets. We provide a review of each data set and data preparation techniques to create Medicare data sets for supervised learning and we propose an improved data labeling process. Next, we enrich the original Medicare fraud data sets with up to 58 new provider summary features. Finally, we address a common model evaluation pitfall and propose an adjusted cross-validation technique that mitigates target leakage to provide reliable evaluation results. Each data set is evaluated on the Medicare fraud classification task using extreme gradient boosting and random forest learners, multiple complementary performance metrics, and 95% confidence intervals. Results show that the new enriched data sets consistently outperform the original Medicare data sets that are currently used in related works. Our results encourage the data-centric machine learning workflow and provide a strong foundation for data understanding and preparation techniques for machine learning applications in healthcare fraud.

PUBLICATION RECORD

Publication year
2023
Venue
SN Computer Science
Publication date
2023-05-11
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1007/s42979-023-01809-x PMID 37200563 PMCID 10173919
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Wayback Machine
2022cited by this paper
Output Thresholding for Ensemble Learners and Imbalanced Big Data
2021cited by this paper
Centers for Medicare & Medicaid Services
2021cited by this paper
Health care fraud classifiers in practice
2021cited by this paper
Encoding Techniques for High-Cardinality Features and Ensemble Learners
2021cited by this paper
The influence of preprocessing on text classification using a bag-of-words representation
2020cited by this paper
Structured Data Preparation Pipeline for Machine Learning-Applications in Production
2020cited by this paper
What Makes an Online Review More Helpful: An Interpretation Framework Using XGBoost and SHAP Values
2020cited by this paper
Thresholding Strategies for Deep Learning with Highly Imbalanced Big Data
2020cited by this paper
Data preprocessing for heart disease classification: A systematic literature review
2020cited by this paper
Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)
2020cited by this paper
The Bad Data Problem
2020cited by this paper
Medicare fraud detection using neural networks
2019cited by this paper
Crowdsourcing with Fairness, Diversity and Budget Constraints
2018cited by this paper
Big Data fraud detection using multiple medicare data sources
2018cited by this paper
A survey on addressing high-class imbalance in big data
2018cited by this paper
Machine learning on big data: Opportunities and challenges
2017cited by this paper
16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, Cancun, Mexico, December 18-21, 2017
2017cited by this paper
A survey on the state of healthcare upcoding fraud analysis and detection
2017cited by this paper
A Unified Approach to Interpreting Model Predictions
2017cited by this paper
Medicare Fraud Detection Using Machine Learning Methods
2017cited by this paper
Improving the Prediction Accuracy of Decision Tree Mining with Data Preprocessing
2017cited by this paper
Survey of pre-processing techniques for mining big data
2017cited by this paper
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
2017cited by this paper
Outlier detection in healthcare fraud: A case study in the Medicaid dental domain
2016cited by this paper
Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner
2016cited by this paper
Improve Face Recognition Rate Using Different Image Pre-Processing Techniques
2016cited by this paper
A Novel Method for Fraudulent Medicare Claims Detection from Expected Payment Deviations (Application Paper)
2016cited by this paper
Graph analytics for healthcare fraud risk estimation
2016cited by this paper
Deep Learning
2016cited by this paper
On the Existence of a Threshold in Class Imbalance Problems
2015cited by this paper
Variability in Medicare utilization and payment among urologists.
2015cited by this paper
Data Mining Practical Machine Learning Tools and Techniques
2014cited by this paper
Knowledge discovery from massive healthcare claims data
2013cited by this paper
An Introduction to Information Retrieval
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Applied stochastic models in business and industry
2011cited by this paper
Multiple imputation by chained equations: what is it and how does it work?
2011cited by this paper
Scikit-learn: Machine Learning in Python
2011cited by this paper
Attribute Normalization in Network Intrusion Detection
2009cited by this paper
Combating fraud in health care: an essential component of any cost containment strategy.
2009cited by this paper
Python for Scientific Computing
2007cited by this paper
Data Preprocessing for Supervised Leaning
2007cited by this paper
SMOTE: Synthetic Minority Over-sampling Technique
2002cited by this paper
CRISP-DM 1.0: Step-by-step data mining guide
2000cited by this paper
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
1997cited by this paper

CITED BY

Enhancing Healthcare Fraud Detection: Comparative Study of CatBoost and TabNet Models
2026cites this paper
An Interpretable and Efficient Random Undersampling-enhanced SHAP Framework for Medicare Fraud Detection
2025influential citation
Unsupervised label generation for severely imbalanced fraud data
2025influential citation
Transforming Patient Care: The Role of Predictive Analytics in Medical Diagnosis and Resource Allocation
2025cites this paper
Foundation Models: From Current Developments, Challenges, and Risks to Future Opportunities
2025cites this paper
Innovative Approaches to Prevent and Detect Medical Insurance Fraud: A Systematic Literature Review
2025cites this paper
Leveraging evolutionary algorithms with a dynamic weighted search space approach for fraud detection in healthcare insurance claims
2025cites this paper
Enhancing Medicare Fraud Detection With a CNN-Transformer-XGBoost Framework and Explainable AI
2025cites this paper
Cybersecurity and Compliance in Clinical Trials: The Role of Artificial Intelligence in Secure Healthcare Management.
2025cites this paper
Evaluating techniques from low-shot learning on traditional imbalanced classification tasks
2025cites this paper
Choosing the Right Metrics: A Study of Performance Measurement for Binary Classification in Imbalanced and Big Data
2025cites this paper
Using Natural Language Processing (NLP) to Identify Fraudulent Healthcare Claims
2025cites this paper
Healthcare Fraud Detection: The Critical Role of Data Quality and Consistency
2025cites this paper
Enhancing Healthcare Integrity Using Simple Statistical Methods: Detecting Irregularities in Historical Dermatology Services Payments
2025cites this paper
Leveraging big data characteristics for enhanced healthcare fraud detection
2025cites this paper
Fraud Detection in Privacy Preserving Health Insurance System Using Blockchain Technology
2025cites this paper
An Overview of Artificial Intelligence in Primary Care and Administrative Medicine.
2025cites this paper
Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions
2025cites this paper
Transformer Model for Fraud Detection in Medical Insurance Claims
2025cites this paper
An Automated Framework for Unsupervised Binary Class Distribution Estimation
2025influential citation
Foundation Models in Digital Pathology Imaging: Next-Generation AI for Healthcare Transformation
2025cites this paper
A New and Effective Technique for Unsupervised Labeling and Feature Selection with Applications in Healthcare Fraud Detection
2025cites this paper
Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data
2025cites this paper
Scalable unsupervised labeling with SHAP feature selection for fraud detection in imbalanced data
2025influential citation
A novel approach to automating unsupervised estimation of class distribution
2025influential citation
National Predictive Analytics Framework for Preventing Healthcare Fraud and Abuse
2025cites this paper
The Evolution of Automated Medical Billing With Artificial Intelligence: A Review With a Global and Saudi Perspective
2025cites this paper
Data-Centric AI for EEG-Based Emotion Recognition: Noise Filtering and Augmentation Strategies
2025cites this paper
Augmenting small tabular health data for training prognostic ensemble machine learning models using generative models
2025cites this paper
Enhancing Fraud Detection in Health Insurance: Deep Neural Network Approaches and Performance Analysis
2025cites this paper
Methodology for Detecting Suspicious Claims in Health Insurance Using Supervised Machine Learning
2025cites this paper
Blockchain-assisted healthcare insurance fraud detection framework using ensemble learning
2025cites this paper
ML-Driven Approaches to Combat Medicare Fraud: Advances in Class Imbalance Solutions, Feature Engineering, Adaptive Learning, and Business Impact
2025cites this paper
Advantages and ethics of artificial intelligence in plastic and reconstructive surgery
2025cites this paper
AI-Driven Risk Control for Health Insurance Fund Management: A Data-Driven Approach
2025cites this paper
Gendered AI in banking services: the influence of financial chatbots’ gender on consumer behaviour
2025cites this paper
Fraud detection in healthcare billing and claims
2024cites this paper
Big data analytic of US Medicare Claims of healthcare service informatics and billing fraud
2024cites this paper
Enhancing Medicare Fraud Detection: Random Undersampling Followed by SHAP-Driven Feature Selection with Big Data
2024cites this paper
Optimizing Machine Learning for Healthcare Fraud Detection: A Framework Using Hybrid Feature Selection and Hyperparameter Tuning
2024cites this paper
Medicare Fraud Detection Using Machine Learning
2024cites this paper
Data reduction techniques for highly imbalanced medicare Big Data
2024cites this paper
Data-Centric Foundation Models in Computational Healthcare: A Survey
2024cites this paper
Enhancing Medicare Fraud Detection Through Machine Learning: Addressing Class Imbalance With SMOTE-ENN
2024cites this paper
Healthcare Fraud Detection Using Machine Learning
2024cites this paper
Leveraging Big Data and AI for Predictive Analysis in Insurance Fraud Detection
2024cites this paper
Collaborative artificial intelligence system for investigation of healthcare claims compliance
2024cites this paper
A Novel Machine Learning Approach For handling Imbalanced Data: Leveraging SMOTE-ENN and XGBoost
2024cites this paper
“Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models”
2024cites this paper
Enhancing healthcare supply chain management through artificial intelligence-driven group decision-making with Sugeno-Weber triangular norms in a dual hesitant q-rung orthopair fuzzy context
2024cites this paper
Challenges and perspectives in use of artificial intelligence to support treatment recommendations in clinical oncology
2024cites this paper
Enhancing Cybersecurity Through Advanced Fraud and Anomaly Detection Techniques: A Systematic Review
2024cites this paper
Leveraging AI Algorithms to Combat Financial Fraud in the United States Healthcare Sector
2024cites this paper
Evaluation of Fraud Prevention Policies in the National Health Insurance System in Indonesia: Narrative Literature Review
2024cites this paper
BalancerGNN: Balancer Graph Neural Networks for imbalanced datasets: A case study on fraud detection
2024cites this paper
Non Linear-Logistic Regression Analysis for AI-Driven Medicare Fraud Detection
2024cites this paper
Transforming Healthcare: AI-NLP Fusion Framework for Precision Decision-Making and Personalized Care Optimization in the Era of IoMT
2024cites this paper
Fraud detection in healthcare claims using machine learning: A systematic review
2024cites this paper
Explainable machine learning models for Medicare fraud detection
2023cites this paper
Using a Bayesian Belief Network to detect healthcare fraud
2023cites this paper
A Comprehensive Analysis of Provider Fraud Detection through Machine Learning
2023cites this paper
Technical Analysis of Data-Centric and Model-Centric Artificial Intelligence
2023cites this paper
Data Reduction to Improve the Performance of One-Class Classifiers on Highly Imbalanced Big Data
2023cites this paper
Improving Medicare Fraud Detection through Big Data Size Reduction Techniques
2023cites this paper
Investigating the effectiveness of one-class and binary classification for fraud detection
2023cites this paper