Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

Published 2011 in arXiv.org

ABSTRACT

Commonly used evaluation measures including Recall, Precision, F-Factor and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case. .

PUBLICATION RECORD

Publication year
2011
Venue
arXiv.org
Publication date
2011-12-15
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2010.16061
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

ADABOOK & MULTIBOOK: Adaptive Boosting with Chance Correction
2020cited by this paper
Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers
2019cited by this paper
Concepts and Applications of Inferential Statistics
2014influential reference
Journal of Research and Practice in Information Technology: Editorial
2010cited by this paper
Editors Appointed for Journal of Educational and Behavioral Statistics
2010cited by this paper
Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation
2007cited by this paper
American Statistician, The
2006cited by this paper
ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms
2005cited by this paper
Inferential Methods for the Tetrachoric Correlation Coefficient
2005cited by this paper
20th International Conference on Machine Learning(ICML 2003)8/21～8/24および9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003)8/24～8/27・ワシントンDC
2004cited by this paper
The exploitation of distributional information in syllable processing
2004cited by this paper
The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics
2003cited by this paper
Three-step censored quantile regression and extramarital affairs, J. Amer. Statist. Assoc., Journal of the American Statistical Association
2002cited by this paper
Audio-Visual Speech Recognition Using Red Exclusion and Neural Networks
2002cited by this paper
Calibration of ρ Values for Testing Precise Null Hypotheses
2001cited by this paper
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001cited by this paper
Bias reduction in skewed binary classification with Bayesian neural networks
2000cited by this paper
Performance Metrics for Intelligent Systems
2000cited by this paper
Rule Evaluation Measures: A Unifying View
1999cited by this paper
Book Reviews: Foundations of Statistical Natural Language Processing
1999cited by this paper
Assessing Agreement on Classification Tasks: The Kappa Statistic
1996cited by this paper
Neural networks
1995cited by this paper
Is Human Learning Rational?
1995cited by this paper
Focus on Psychometrics. Kappa muddles together two sources of disagreement: tetrachoric correlation is preferable.
1993cited by this paper
Statistical Decision Theory and Bayesian Analysis, Second Edition
1993cited by this paper
Diversity of decision-making models and the measurement of interrater agreement.
1987cited by this paper
Research in Nursing
1985cited by this paper
Biometry: The Principles and Practice of Statistics in Biological Research (2nd ed.).
1982cited by this paper
Approximating the Moments and Distribution of the Likelihood Ratio Statistic for Multinomial Goodness of Fit
1981cited by this paper
Improved likelihood ratio tests for complete contingency tables
1976cited by this paper
Biometry: The Principles and Practice of Statistics in Biological Research
1969cited by this paper
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
1968cited by this paper
Psychological bulletin.
1962cited by this paper
Educational and Psychological Measurement
1962cited by this paper
A Coefficient of Agreement for Nominal Scales
1960cited by this paper
Biometrika
1902cited by this paper

CITED BY

GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
2026cites this paper
A bidirectional LSTM-based approach for long-text named entity recognition
2026cites this paper
Root-associated protein prediction using a protein large language model and hypergraph convolutional networks
2026cites this paper
A Low-Cost RGB-Based Image Processing Method for High-Throughput Assessment of Rice Grain Chalkiness
2026cites this paper
Comparative Assessment of Machine Learning Approaches for Early Lung Cancer Diagnosis
2026cites this paper
An Ensemble Voting-Based Framework for Maintenance Decision Support in Mining Centrifuges
2026cites this paper
Spectral network-based graph convolutional features to enhance hyperspectral detection of intestinal adenocarcinoma
2026cites this paper
Slope stability prediction using automated ensemble learning-based models: a comprensive geotechnical study
2026cites this paper
KLSBench: Evaluating LLM Capabilities on Korean Literary Sinitic Texts in Historical Context
2026influential citation
Improving the Reliability of Bank Customer Churn Prediction via Calibration and Uncertainty Quantification
2026cites this paper
Predicting protein–carbohydrate binding sites: a deep learning approach integrating protein language model embeddings and structural features
2026cites this paper
A hybrid deep learning and large language models framework for ship collision accident analysis
2026cites this paper
Machine Learning for Detection and Severity Estimation of Sweetpotato Weevil Damage in Field and Lab Conditions
2026cites this paper
Crop classification method for multi-temporal remote sensing imagery based on a (3 + 2)D SAFPN.
2026cites this paper
charisma: An R package to perform reproducible color characterization of digital images for biological studies
2026cites this paper
AI vs. Humans: Comparing road user intention recognition performance
2026cites this paper
Driver-Intention Prediction with Deep Learning: Real-Time Brain-to-Vehicle Communication
2026cites this paper
Next-Gen IoT Security: Deep Learning-Based Detection of RPL Attacks in Mobile Converged Networks
2026cites this paper
Expert Selection for Wordlist-Based DGA Detection
2026cites this paper
Dual-representation structural MRI classification of psychiatric disorders using deep learning and large language models.
2026cites this paper
Machine learning using entropy–based texture features from MRI to differentiate histological subtypes of non–small cell lung cancer identified as metabolically active on PET/MRI
2026cites this paper
Implementasi Artificial Neural Network (ANN) Untuk Memprediksi Cuaca Harian Berdasarkan Data Suhu dan Kelembapan
2026cites this paper
Beyond Literacy: Predicting Interpretation Correctness of Visualizations with User Traits, Item Difficulty, and Rasch Scores
2026cites this paper
Artificial Intelligence in Nephrology—State of the Art on Theoretical Background, Molecular Applications, and Clinical Interpretation
2026cites this paper
Exploring Scientific Literature Using Topic Modeling: A Practical Framework for Discovery and Classification
2026cites this paper
Multimodal cyberbullying detection in Hinglish memes using a classroom framework based on large language models
2026cites this paper
Post-Pandemic Trends in Residential Space Design: An Analysis Using Deep Learning and Expert Evaluation
2026cites this paper
Cost-Aware Model Selection for Text Classification: Multi-Objective Trade-offs Between Fine-Tuned Encoders and LLM Prompting in Production
2026cites this paper
Machine learning integration with seismic attributes for lithofacies prediction and distribution in Abu Madi reservoir, onshore Faraskour gas field, east Nile Delta, Egypt
2026cites this paper
SpliceRead: Improving Canonical and Non-Canonical Splice Site Prediction with Residual Blocks and Synthetic Data Augmentation
2026cites this paper
Real-time vision-extended holographic near-eye display system based on perceptual range-adaptive visualization architecture
2026cites this paper
Optimization of machine learning models for effective anomaly detection in industrial IoT systems
2026cites this paper
Deep learning-based identification of carbonized seeds: A case study on Panicum miliaceum (Broomcorn Millet) and Setaria italica (Foxtail Millet)
2026cites this paper
Mapping and Revealing the River Ice Distribution and Changes in the Three Rivers Source Region From 1990 to 2023 Using Google Earth Engine
2026cites this paper
Benchmarking the adversarial resilience of machine learning models for DDoS detection
2026cites this paper
Detecting Semantic Backdoors in a Mystery Shopping Scenario
2026cites this paper
Applications and Challenges of Visible-Near-Infrared and Mid-Infrared Spectroscopy in Soil Analysis: Chemometric Approaches and Data Fusion
2026cites this paper
HDR‐SA: A Hybrid Deep Learning and RoBERTa‐Based Framework for Sentiment and Aspect Analysis
2026cites this paper
Detecting Autism Spectrum Disorder with Deep Eye Movement Features
2026cites this paper
Effects of a transformer-based AI-based application to support incontinence-associated dermatitis and pressure injury assessment, nursing care and documentation: Controlled pilot intervention study
2026cites this paper
DCA-UNet: A Cross-Modal Ginkgo Crown Recognition Method Based on Multi-Source Data
2026cites this paper
DrugBank mining with machine learning reveals novel candidates for BCL-2 inhibition
2026cites this paper
Scrap-SAM-CLIP: Assembling Foundation Models for Typical Shape Recognition in Scrap Classification and Rating
2026cites this paper
Toward Faithful Explanations in Acoustic Anomaly Detection
2026cites this paper
UAV-Based Forest Fire Early Warning and Intervention Simulation System with High-Accuracy Hybrid AI Model
2026cites this paper
Enhancing Deep Learning–Based Railway Inspection via PSO-Guided Brightness–Contrast Optimization
2026cites this paper
An audit of machine learning experiments on software defect prediction
2026influential citation
Intelligent Attention-Driven Deep Learning for Hip Disease Diagnosis: Fusing Multimodal Imaging and Clinical Text for Enhanced Precision and Early Detection.
2026cites this paper
Artificial Intelligence in the synthesis and application of advanced dental biomaterials: a narrative review of probabilities and challenges
2026cites this paper
Random Forest vs Elastic-Net Penalized Logistic Regression for Patient Discharge Classification in BPJS Primary Care
2026influential citation
Study of Chemical Sand Consolidation Transition from Polymers to Nanoparticles
2026cites this paper
Interpretable Detector Secure Against Stealthy False Power Consumption Attacks
2026cites this paper
Cross platform social media analysis for mental health detection
2026cites this paper
Global ROTI forecasting with a Bayesian model based in long-tail distributions
2026cites this paper
Experiencer, Helper, or Observer: Online Fraud Intervention for Older Adults Through Role-based Simulation
2026cites this paper
Deep learning framework for RNA 5hmC prediction using RNA language model embeddings
2026cites this paper
Trial-By-Trial Auditory Brainstem Response Detection
2026cites this paper
Tackling Small Lunar Impact Crater Classification: A Novel Augmented Data Set and Enhanced Deep Learning Framework
2026cites this paper
Qualitative Model for Hurricane-Induced Debris Flow Prediction: A Case Study of the Impact of Hurricane Maria (2017) in Puerto Rico
2026cites this paper
Recall, Risk, and Governance in Automated Proposal Screening for Research Funding: Evidence from a National Funding Programme
2026cites this paper
Enhanced extractive text summarization framework for low-resourced Urdu language
2026cites this paper
KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis
2026cites this paper
Identifying spatial domains from spatial multi-omics data using consistent and specific deep subspace learning
2026cites this paper
A deep neural network model with physics-guided term for automatic identification of atmospheric fronts
2026cites this paper
The influence of neighborhood environments on children's travel mode choices: An XGBoost/SHAP model analysis of Shuangliu District, Chengdu, China
2026cites this paper
Hybrid fractional thermoelastic–machine learning framework for heat and mass transfer in skin tissue: Enhanced simulations using Atangana–Baleanu, Cattaneo–Vernotte models, and KNN–SVM classifiers
2026cites this paper
Proactive Construction Hazard Prevention Model Using Machine Learning
2026cites this paper
Flight delay prediction using machine learning and explainable AI: a case study on Hazrat Shahjalal International Airport, Dhaka
2026cites this paper
Detection of mining-induced microseismicity through a deep convolutional neural network
2026cites this paper
A Bayesian Hybrid Attention Module for Underwater Acoustic Target Recognition
2026cites this paper
Analysis of BirdNET Configuration and Performance Applied to the Acoustic Monitoring of a Restored Quarry
2026cites this paper
An Energy-Efficient Smart Bus Transport Management System with Blind-Spot Collision Detection Ability
2026cites this paper
Methods and evaluation in unsupervised keyphrase prediction: a survey
2026cites this paper
Learnt formant modulation via upper vocal tract movements in a marine mammal
2026cites this paper
Evaluating machine learning algorithms and satellite-based features to spatially detect active petroleum seepages in the Aghajari oil field, SW Iran
2026cites this paper
QuaRUM: qualitative data analysis-based retrieval-augmented UML domain model from requirements documents
2026cites this paper
A Novel Laser Mode Identification Method Based on Wavemeter Interference Fringe and Machine Learning
2026cites this paper
Identifying transcriptional signatures of leukocytes in tissue and blood for multicancer diagnosis by using machine learning methods.
2026cites this paper
A multi-modal dataset and method for bone-level association prediction in oracle bone inscriptions
2026cites this paper
A Dual Pipeline Machine Learning Framework for Automated Multi Class Sleep Disorder Screening Using Hybrid Resampling and Ensemble Learning
2026cites this paper
Towards the transformation of MATLAB models into FPGA-Based hardware accelerators
2026cites this paper
AI-powered modeling of bee spermatozoa quality post agrochemical exposure
2026cites this paper
Complex-valued convolutional neural networks for disease detection utilizing digital holographic wavefronts
2026cites this paper
Interpetatle data mining for legume crude protein prediction
2026cites this paper
Why ROC-AUC Is Misleading for Highly Imbalanced Data: In-Depth Evaluation of MCC, F2-Score, H-Measure, and AUC-Based Metrics Across Diverse Classifiers
2026cites this paper
PotatoLeafNet: two-stage convolutional neural networks for effective Potato Leaf disease identification and classification
2026cites this paper
Hypersector-Based Method for Real-Time Classification of Wind Turbine Blade Defects
2026cites this paper
ArSL-TGRU: A Hybrid Model for Signer-Independent Arabic Sign Language Recognition from Videos Based on Transformer and GRU
2026cites this paper
Fatigue Crack Length Estimation Using Acoustic Emissions Technique-Based Convolutional Neural Networks
2026cites this paper
Beyond Target-Level: ISAC-Enabled Event-Level Sensing for Behavioral Intention Prediction
2026cites this paper
Credit Card Fraud Detection Using Metaheuristic Techniques
2026cites this paper
FishDiveR: wavelet analyses and machine learning provide robust classification of animal behaviour from time-depth data
2026cites this paper
Chaotic Dynamics Analysis of Magnetocardiography Signals for Early Detection of Myocardial Ischemia.
2026cites this paper
ADAM-Net: Anatomy-Guided Attentive Unsupervised Domain Adaptation for Joint MG Segmentation and MGD Grading
2026cites this paper
Prediction of protein-carbohydrate binding sites from protein primary sequence
2026cites this paper
HyDeMiC: A Deep Learning-based Mineral Classifier using Hyperspectral Data
2026cites this paper
Federated Proximal Optimization for Privacy-Preserving Heart Disease Prediction: A Controlled Simulation Study on Non-IID Clinical Data
2026cites this paper
Generative AI-driven data augmentation and object-guided vision-language reasoning for PPE compliance analysis in work-at-height
2026cites this paper
Hybrid fractional thermoelastic–machine learning (KNN, CNN and SVM classifier) framework for heat and mass transfer: A computational mechanics approach
2026cites this paper
Predictive Maintenance of Mining Centrifuges Using Machine Learning and Deep Learning Models
2026cites this paper