Data mining in metric space: an empirical analysis of supervised learning performance criteria

Published 2004 in ROC Analysis in Artificial Intelligence

ABSTRACT

Many criteria can be used to evaluate the performance of supervised learning. Different criteria are appropriate in different settings, and it is not always clear which criteria to use. A further complication is that learning methods that perform well on one criterion may not perform well on other criteria. For example, SVMs and boosting are designed to optimize accuracy, whereas neural nets typically optimize squared error or cross entropy. We conducted an empirical study using a variety of learning methods (SVMs, neural nets, k-nearest neighbor, bagged and boosted trees, and boosted stumps) to compare nine boolean classification performance metrics: Accuracy, Lift, F-Score, Area under the ROC Curve, Average Precision, Precision/Recall Break-Even Point, Squared Error, Cross Entropy, and Probability Calibration. Multidimensional scaling (MDS) shows that these metrics span a low dimensional manifold. The three metrics that are appropriate when predictions are interpreted as probabilities: squared error, cross entropy, and calibration, lay in one part of metric space far away from metrics that depend on the relative order of the predicted values: ROC area, average precision, break-even point, and lift. In between them fall two metrics that depend on comparing predictions to a threshold: accuracy and F-score. As expected, maximum margin methods such as SVMs and boosted trees have excellent performance on metrics like accuracy, but perform poorly on probability metrics such as squared error. What was not expected was that the margin methods have excellent performance on ordering metrics such as ROC area and average precision. We introduce a new metric, SAR, that combines squared error, accuracy, and ROC area into one metric. MDS and correlation analysis shows that SAR is centrally located and correlates well with other metrics, suggesting that it is a good general purpose metric to use when more specific criteria are not known.

PUBLICATION RECORD

Publication year
2004
Venue
ROC Analysis in Artificial Intelligence
Publication date
2004-08-22
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1145/1014052.1014063
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Applied Data Mining
2005cited by this paper
The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics
2003cited by this paper
Tree Induction for Probability-Based Ranking
2003cited by this paper
Support Vector Machine Classifiers as Applied to AVIRIS Data
1999cited by this paper
Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods
1999cited by this paper
Making large scale SVM learning practical
1998cited by this paper
UCI Repository of machine learning databases
1998cited by this paper
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
1997cited by this paper
STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS
1995cited by this paper
The Comparison and Evaluation of Forecasters.
1983cited by this paper

CITED BY

Simulation Study on How Input Data Affects Time-Series Classification Model Results
2025cites this paper
Optimum feature selection for the supervised damage classification of an operating wind turbine blade
2025cites this paper
Feature selection for unsupervised defect detection of a wind turbine blade considering operational and environmental conditions
2025cites this paper
An efficient mechanism for time series forecasting and anomaly detection using explainable artificial intelligence
2025cites this paper
Sum of Euclidean distance differences and sum of absolute Manhattan distance differences: multicriteria decision making tools for small data tables.
2025cites this paper
Optimizing Credit Scoring Performance Using Ensemble Feature Selection with Random Forest
2025cites this paper
Implementing digital technology in landscape architecture: cityscape image segmentation via deep learning
2025cites this paper
Performance Measures for Sample Selection Bias Correction by Weighting
2025cites this paper
COPA: Comparing the incomparable in multi-objective model evaluation
2025cites this paper
Two‐Step Estimation Strategy for Predicting Petroleum Reservoir Simulation Jobs Runtime on an HPC Cluster
2025cites this paper
Error-Driven Design of AI-Based Systems for Airborne Applications
2025cites this paper
Application of machine learning in early childhood development research: a scoping review
2025cites this paper
Predicting baccalaureate student result to prevent failure: a hybrid model approach
2024cites this paper
DROPSYS: Detection of ROP attacks using system information
2024cites this paper
Efficient Implementation of Multilayer Perceptrons: Reducing Execution Time and Memory Consumption
2024cites this paper
Comprehensive evaluation of classification: an empirical study on consequence prediction of construction accidents in China
2024cites this paper
Predicting Construction Company Insolvent Failure: A Scientometric Analysis and Qualitative Review of Research Trends
2024cites this paper
On Fixing the Right Problems in Predictive Analytics: AUC Is Not the Problem
2024cites this paper
Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data
2024cites this paper
Design, security and implementation of learning focal point algorithm in a docker container
2024cites this paper
Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures
2024influential citation
Anomaly detection in multivariate time series data using deep ensemble models
2024cites this paper
Artificial Neural Networks for Photovoltaic Power Forecasting: A Review of Five Promising Models
2024cites this paper
AutoTask: Task Aware Multi-Faceted Single Model for Multi-Task Ads Relevance
2024cites this paper
Identifying hybrid heating systems in the residential sector from smart meter data
2023cites this paper
MACHINE LEARNING MODELS FOR EXTRAPOLATIVE ANALYTICS AS A PANACEA FOR BUSINESS INTELLIGENCE DECISIONS
2023cites this paper
Statistical Database of Human Motion Recognition Using Wearable IoT—A Review
2023cites this paper
Integrated data-driven modeling and experimental optimization of granular hydrogel matrices
2023cites this paper
A new method for improving prediction performance in neural networks with insufficient data
2023cites this paper
Time Series Forecasting and Anomaly Detection Using Deep Learning
2023cites this paper
Biophysical parameters control signal transfer in spiking network
2023cites this paper
Multi-Objective Hyperparameter Optimization in Machine Learning—An Overview
2022cites this paper
The Pandemic Driving Socially Responsible Work-Family Performance in the Transportation Sector
2022cites this paper
Credit Card Fraud Detection
2022cites this paper
Predicting the Severity of the Road Accidents in Senegal: An Empirical Study
2022cites this paper
Enhanced Multiview Fuzzy Clustering Using Double Visible-Hidden View Cooperation and Network LASSO Constraint
2022cites this paper
Spectral Ranking Regression
2022cites this paper
Calibrate: Interactive Analysis of Probabilistic Model Output
2022cites this paper
Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model
2022cites this paper
Multi-Objective Hyperparameter Optimization - An Overview
2022influential citation
Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
2022cites this paper
A Feature Extraction & Selection Benchmark for Structural Health Monitoring
2022cites this paper
Using machine learning to predict low academic performance at a Nigerian university
2022cites this paper
Predicting and Preventing Crime: A Crime Prediction Model Using San Francisco Crime Data by Classification Techniques
2022cites this paper
Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis
2022cites this paper
Dynamic Hand Gesture Recognition Using Electrical Impedance Tomography
2022cites this paper
PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics
2022cites this paper
Sentiment Analysis of Consumer Reviews Using Deep Learning
2022cites this paper
Exploratory Analysis on Pixelwise Image Segmentation Metrics with an Application in Proximal Sensing
2022cites this paper
Contingency Space: A Semimetric Space for Classification Evaluation
2022cites this paper
Mamdani ve Sugeno Tip Bulanık Çıkarım Sistemleri ile Sosyal Medya Haber Popülerliğinin Tahmini
2022cites this paper
Some measures to impact on the performance of Kohonen self-organizing map
2021cites this paper
A Comparative Analysis of Classifiers Using Particle Swarm Optimization‐Based Feature Selection
2021cites this paper
On the appropriateness of Platt scaling in classifier calibration
2021cites this paper
Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining
2021cites this paper
Investigating Health-Related Features and Their Impact on the Prediction of Diabetes Using Machine Learning
2021influential citation
On the Value of ML Models
2021cites this paper
Deep Spectral Ranking
2021cites this paper
Deep Learning Models for Knowledge Tracing: Review and Empirical Evaluation
2021cites this paper
BenchMetrics: a systematic benchmarking method for binary classification performance metrics
2021cites this paper
An instance-oriented performance measure for classification
2021cites this paper
Regional estimates of a range‐extending ecosystem engineer using stereo‐imagery from ROV transects collected with an efficient, spatially balanced design
2021influential citation
Empirical Evaluation of Deep Learning Models for Knowledge Tracing: Of Hyperparameters and Metrics on Performance and Replicability
2021cites this paper
Spatial prediction of landslides along National Highway-6, Hoa Binh province, Vietnam using novel hybrid models
2021cites this paper
Investigating Non-Laboratory Variables to Predict Diabetic and Prediabetic Patients from Electronic Medical Records Using Machine Learning
2021cites this paper
Transforming approach for assessing the performance and applicability of rice arsenic contamination forecasting models based on regression and probability methods.
2021cites this paper
Optimising HEP parameter fits via Monte Carlo weight derivative regression
2020cites this paper
Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran
2020cites this paper
Discriminative algorithm approach to forecast Cd threshold exceedance probability for rice grain based on soil characteristics.
2020cites this paper
Predictive Model Evaluation for PHM
2020cites this paper
Intrusion Detection and Prevention
2020cites this paper
A Study of Fraud Types, Challenges and Detection Approaches in Telecommunication
2020cites this paper
Are Crises Predictable? A Review of the Early Warning Systems in Currency and Stock Markets.
2020cites this paper
An empirical exploration of performance metrics for event detection algorithms in Non-Intrusive Load Monitoring
2020cites this paper
A Review of Android Malware Detection Approaches Based on Machine Learning
2020cites this paper
Feature Selection Metrics: Similarities, Differences, and Characteristics of the Selected Models
2020cites this paper
Using Bayesian Networks to Predict Long-Term Health-Related Quality of Life and Comorbidity after Bariatric Surgery: A Study Based on the Scandinavian Obesity Surgery Registry
2020cites this paper
Error Metrics and Performance Fitness Indicators for Artificial Intelligence and Machine Learning in Engineering and Sciences
2020influential citation
Performance Evaluation of Machine Learning Methods for Forest Fire Modeling and Prediction
2020cites this paper
On the Dynamics of Classification Measures for Imbalanced and Streaming Data
2020cites this paper
Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage From Social Media
2019cites this paper
Evaluating methods for grouping and comparing crash dumps
2019influential citation
FARE: Diagnostics for Fair Ranking using Pairwise Error Metrics
2019cites this paper
Study of Constrained Network Structures for WGANs on Numeric Data Generation
2019cites this paper
Feature Selection Method using Genetic Algorithm for Medical Dataset
2019cites this paper
Adaptive memetic method of multi-objective genetic evolutionary algorithm for backpropagation neural network
2019cites this paper
Exploring Symmetry of Binary Classification Performance Metrics
2019cites this paper
Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
2019cites this paper
Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
2019cites this paper
Heuristic design of fuzzy inference systems: A review of three decades of research
2019cites this paper
Development of new agglomerative and performance evaluation models for classification
2019cites this paper
Predicting Airline Customer Satisfaction using k-nn Ensemble Regression Models
2019cites this paper
Performance Analysis and Ranking of Data Mining Algorithms Across Multiple Datasets
2019cites this paper
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
2019cites this paper
Do the AUC and log-loss evaluate CTR prediction models properly?
2019cites this paper
Physical and Metrological Approach for Feature’s Definition and Selection in Condition Monitoring
2019cites this paper
An integrated early warning system for stock market turbulence
2019cites this paper
Study of Restrained Network Structures for Wasserstein Generative Adversarial Networks (WGANs) on Numeric Data Augmentation
2019cites this paper
A Comparative Approach of Dimensionality Reduction Techniques in Text Classification
2019cites this paper
IBM Watson Studio: A Platform to Transform Data to Intelligence
2019cites this paper