Inequalities between multi-rater kappas

Published 2010 in Advances in Data Analysis and Classification

ABSTRACT

The paper presents inequalities between four descriptive statistics that have been used to measure the nominal agreement between two or more raters. Each of the four statistics is a function of the pairwise information. Light’s kappa and Hubert’s kappa are multi-rater versions of Cohen’s kappa. Fleiss’ kappa is a multi-rater extension of Scott’s pi, whereas Randolph’s kappa generalizes Bennett et al. S to multiple raters. While a consistent ordering between the numerical values of these agreement measures has frequently been observed in practice, there is thus far no theoretical proof of a general ordering inequality among these measures. It is proved that Fleiss’ kappa is a lower bound of Hubert’s kappa and Randolph’s kappa, and that Randolph’s kappa is an upper bound of Hubert’s kappa and Light’s kappa if all pairwise agreement tables are weakly marginal symmetric or if all raters assign a certain minimum proportion of the objects to a specified category.

PUBLICATION RECORD

Publication year
2010
Venue
Advances in Data Analysis and Classification
Publication date
2010-12-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1007/s11634-010-0073-4
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Inequalities Between Kappa and Kappa-Like Statistics for k×k Tables
2010influential reference
A Formal Proof of a Paradox Associated with Cohen’s Kappa
2010cited by this paper
Cohen's kappa can always be increased and decreased by combining categories
2010cited by this paper
A Kraemer-type Rescaling that Transforms the Odds Ratio into the Weighted Kappa Coefficient
2010cited by this paper
A note on the linearly weighted kappa coefficient for ordinal scales
2009cited by this paper
On Similarity Coefficients for 2×2 Tables and Correction for Chance
2008cited by this paper
On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index
2008cited by this paper
Bounds of Resemblance Measures for Binary (Presence/Absence) Variables
2008cited by this paper
On the Indeterminacy of Resemblance Measures for Binary (Presence/Absence) Data
2008cited by this paper
On Association Coefficients for 2×2 Tables and Properties That Do Not Depend on the Marginal Distributions
2008cited by this paper
Variance Estimation of Nominal-Scale Inter-Rater Reliability with Random Selection of Raters
2008cited by this paper
Fuzzy kappa for the agreement measure of fuzzy classifications
2007cited by this paper
Agreement and Kappa-Type Indices
2007cited by this paper
Interobserver reproducibility in the diagnosis of flat epithelial atypia of the breast
2006cited by this paper
Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa.
2005cited by this paper
Squibs and Discussions: The Kappa Statistic: A Second Look
2004cited by this paper
Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α
2003cited by this paper
Kappa coefficients in medical research
2002cited by this paper
A Measure of Agreement for Interval or Nominal Multivariate Observations
2001cited by this paper
Beyond kappa: A review of interrater agreement measures
1999cited by this paper
Another look at interrater agreement.
1988influential reference
A Generalization of Cohen's Kappa Agreement Measure to Interval Measurement and Multiple Raters
1988cited by this paper
Association, agreement, and equity
1987cited by this paper
Nominal scale agreement among observers
1986cited by this paper
Overeenstemmingsmaten voor nominale data
1983cited by this paper
Measuring Agreement for Multinomial Data
1982influential reference
Measuring Pairwise Agreement Among Many Observers. II. Some Improvements and Additions
1982cited by this paper
Generalization of Scott's Index of Intercoder Agreement
1981cited by this paper
Coefficient Kappa: Some Uses, Misuses, and Alternatives
1981cited by this paper
Extension of the kappa coefficient.
1980cited by this paper
Integration and generalization of kappas for multiple raters.
1980influential reference
Measuring pairwise agreement among many observers
1980cited by this paper
An Extension of the Random Error Coefficient of Agreement to N x N Tables
1979cited by this paper
Ramifications of a population model forκ as a coefficient of reliability
1979cited by this paper
On Generalizations Of The G Index And The Phi Coefficient To Nominal Scales.
1979cited by this paper
The measurement of observer agreement for categorical data.
1977cited by this paper
Measuring nominal scale agreement among many raters.
1971cited by this paper
Measures of response agreement for qualitative data: Some generalizations and alternatives.
1971cited by this paper
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
1968cited by this paper
A Coefficient of Agreement for Nominal Scales
1960influential reference
Reliability of Content Analysis ; The Case of Nominal Scale Cording
1955influential reference
Communications Through Limited-Response Questioning
1954influential reference

CITED BY

Comparing Agreement Indices to Assess Inter-Observer Reliability in the Case of Dichotomous and Trichotomous Animal-Based Welfare Indicators with Three Raters.
2026cites this paper
PROPOSTA DE UM INSTRUMENTO PARA GERENCIAMENTO DE RISCOS NO CONTROLE DA QUALIDADE DA VACINA CONTRA FEBRE AMARELA
2026cites this paper
Assessing the Accuracy of Symptoms and Adverse Events Reporting for Lung Cancer Treatment in the Danish National Patient Registry.
2026cites this paper
Some common statistical methods for assessing rater agreement in radiological studies
2025cites this paper
LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases
2025cites this paper
Integrating Predictive Modeling with Sentiment Analysis for Lassa Fever Prediction
2025cites this paper
Validation of proposed imaging criteria for estimating vessels encapsulating tumor clusters in hepatocellular carcinoma at CT and gadoxetic acid-enhanced MRI.
2025cites this paper
Design and Validation of Guidelines for Creating Mathematical Applets for Students With Autism
2025cites this paper
Faculty influences on academic integrity at postgraduate level – views from Spanish universities
2025cites this paper
A fully automated explainable predictive model for diagnosing pre-capillary and post-capillary pulmonary hypertension on routine unenhanced CT: results from the ASPIRE registry
2025cites this paper
Assessment of Machine Learning Techniques in Mapping Land Use/Land Cover Changes in a Semi-Arid Environment
2025cites this paper
EVALUATION OF BASIC INTERVIEW SKILLS REALIZED WITH SIMULATED PATIENT APPLICATIONS
2025cites this paper
AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web
2025cites this paper
Interobserver Agreement for the Paris Classification of Colorectal Lesions Amongst Surgeons, Gastroenterologists, Trainees and Experts: A Video-Based Study
2025cites this paper
Evaluating performance and quality of a fast multi‐contrast scan in routine brain MRI
2024cites this paper
Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements
2024cites this paper
“Image, Tell me your story!” Predicting the original meta-context of visual misinformation
2024cites this paper
Publication bias in pharmacogenetics of statin-associated muscle symptoms: A meta-epidemiological study.
2024cites this paper
Genetic Characterization and Population Structure of Drug-Resistant Mycobacterium tuberculosis Isolated from Brazilian Patients Using Whole-Genome Sequencing
2024cites this paper
Understanding information needs for seamless intermodal transportation: Evidence from Germany
2024cites this paper
Evaluating Visually Lossless Compression of JPEG XS, JPEG 2000, HEVC and AV1 in Selected Medical Imaging Modalities
2024cites this paper
Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
2024cites this paper
A Dataset of Electrical Components for Mesh Segmentation and Computational Geometry Research
2024cites this paper
Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple raters.
2024cites this paper
Toward Related Work Generation with Structure and Novelty Statement
2024cites this paper
Prospective Study of Non-Contrast, Abbreviated MRI for Hepatocellular Carcinoma Surveillance in Patients with Suboptimal Hepatic Visualisation on Ultrasound
2024cites this paper
Measures of interrater agreement for quantitative data
2023cites this paper
AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web
2023cites this paper
Simulation-based remediation in emergency medicine residency training- A consensus study
2023cites this paper
Determining essential criteria for selection of risk assessment techniques in occupational health and safety: A hybrid framework of fuzzy Delphi method
2023cites this paper
A Proven Sentiment Annotation Guideline for Indonesian Twitter Data
2023cites this paper
Chat GPT for the management of obstructive sleep apnea: do we have a polar star?
2023influential citation
A Systematic Review of Peer Assessment Design Elements
2023cites this paper
The dilemma of the disappearing colorectal liver metastases: defining international trends in management.
2023cites this paper
Audio-Visual Automatic Group Affect Analysis
2023cites this paper
Academic Faculty Demonstrate Weak Agreement in Evaluating Orthopaedic Surgery Residents
2023cites this paper
UNet and MobileNet CNN-based model observers for CT protocol optimization: comparative performance evaluation by means of phantom CT images
2023cites this paper
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
2023cites this paper
The association between video or telephone telemedicine visit type and orders in primary care
2022cites this paper
Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple raters
2022cites this paper
Consenso nacional sobre los contenidos de Grado en Fisioterapia cardiorrespiratoria: estudio Delphi
2022cites this paper
Nomogram Estimating Vessels Encapsulating Tumor Clusters in Hepatocellular Carcinoma From Preoperative Gadoxetate Disodium‐Enhanced MRI
2022cites this paper
Examining the theory of challenge and threat states in athletes: do predictions extend to academic performance?
2022influential citation
Microscale pedestrian environment surrounding pedestrian injury sites in Washington state, 2015–2020
2022cites this paper
Le défi de l’attractivité pour les PME : le rôle du site internet institutionnel sur le soutien organisationnel anticipé et les intentions de postuler
2022cites this paper
Innovations in Undergraduate Research Training Through Multisite Collaborative Programming: American Heart Association Summer Undergraduate Research Experience Syndicate
2022cites this paper
PRE-SERVICE CHEMISTRY TEACHERS’ PROFESSIONAL VISION DEVELOPMENT: THE EFFECT OF LESSON-OBSERVATION PRACTICE
2022cites this paper
ASSISTÊNCIA DE ENFERMAGEM EM SITUAÇÃO DE ABORTAMENTO RETIDO: CENÁRIO VALIDADO PARA SIMULAÇÃO CLÍNICA
2022cites this paper
Evaluating the Cranfield Paradigm for Conversational Search Systems
2022cites this paper
Interobserver Variability of Hip Dysplasia Indices on Sweep Ultrasound for Novices, Experts, and Artificial Intelligence
2022cites this paper
An Empirical Comparative Assessment of Inter-Rater Agreement of Binary Outcomes and Multiple Raters
2022cites this paper
A comprehensive look at explicit screening tools for potentially inappropriate medication: A systematic scoping review
2022cites this paper
A First Approximation to Linear CRF classifiers for Finger Movement Classification
2022cites this paper
Leveraging machine translation for cross-lingual fine-grained cyberbullying classification amongst pre-adolescents
2022cites this paper
Automated diagnosis of hip dysplasia from 3D ultrasound using artificial intelligence: A two-center multi-year study
2022cites this paper
Preoperative versus Post-operative Radiotherapy for Extremity Soft tissue Sarcoma: a Systematic Review and Meta-analysis of Long-term Survival.
2021cites this paper
A Comparison of Reliability Coefficients for Ordinal Rating Scales
2021cites this paper
AOMD: An Analogy-aware Approach to Offensive Meme Detection on Social Media
2021influential citation
Advance care planning by proxy in German nursing homes: Descriptive analysis and policy implications
2021cites this paper
The impact of grey zones on the accuracy of agreement measures for ordinal tables
2021cites this paper
Reliability of Perceptual Judgments of Phonetic Accuracy and Hypernasality Among Speech-Language Pathologists for Children With Dysarthria.
2021cites this paper
Teachers’ and Their Pupils’ Performance on Plant Nutrition: a Comparative Case
2021cites this paper
Teachers’ and Their Pupils’ Performance on Plant Nutrition: a Comparative Case
2021influential citation
Claim Matching Beyond English to Scale Global Fact-Checking
2021cites this paper
Few-shot Controllable Style Transfer for Low-Resource Multilingual Settings
2021cites this paper
Response to 'Inter and intraobserver reliability and critical analysis of the FFP classification of osteoporotic pelvic ring injuries: Methodological issue' (doi:10.1016/j.injury.2019.04.011).
2021cites this paper
A Mixed-Methods Investigation of Licensed Masters-Level Social Worker’s Engagement in Outcome Evaluation
2021cites this paper
Few-shot Controllable Style Transfer for Low-Resource Settings: A Study in Indian Languages
2021cites this paper
Assessing agreement between raters from the point of coefficients and loglinear models
2021cites this paper
Exposure to unhealthy product advertising: Spatial proximity analysis to schools and socio-economic inequalities in daily exposure measured using Scottish Children's individual-level GPS data
2021influential citation
Visual Evaluation of Image Quality of a Low Dose 2D/3D Slot Scanner Imaging System Compared to Two Conventional Digital Radiography X-ray Imaging Systems
2021cites this paper
Kappa coefficients for dichotomous-nominal classifications
2020cites this paper
Influence of type of radiograph and levels of experience and training on reproducibility of the cervical vertebral maturation method.
2020cites this paper
A Bootstrapping Method for Improving the Classification Performance in the P300 Speller
2020cites this paper
Can you hear what you cannot say? The interactions of speech perception and production during non-native phoneme learning
2020cites this paper
Examining how rural ecological contexts influence children’s early learning opportunities
2020cites this paper
Choices of Therapeutic Strategies for Colorectal Liver Metastases Among Expert Liver Surgeons: A Throw of the Dice?
2020cites this paper
Quantitative Methods for Analyzing Intimate Partner Violence in Microblogs: Observational Study
2020influential citation
Using the Delphi Method to Evaluate the Appropriateness of Urban Freight Transport Solutions
2020cites this paper
A new classification of impacted proximal humerus fractures based on the morpho-volumetric evaluation of humeral head bone loss with a 3D model.
2020cites this paper
The medication discrepancy taxonomy (MedTax): The development and validation of a classification system for medication discrepancies identified through medication reconciliation.
2020cites this paper
Kappa coefficients for dichotomous-nominal classifications
2020cites this paper
The effect of an e-learning module on grading variation of (pre)malignant breast lesions
2020cites this paper
Cytologic grading of primary malignant salivary gland tumors: A blinded review by an international panel
2020cites this paper
DNA flow cytometric and interobserver study of crypt cell atypia in inflammatory bowel disease
2019cites this paper
Online Pragmatic Language Use in Asperger Syndrome and Learning Disability Discussion Forums
2019cites this paper
Las conductas fraudulentas del alumnado universitario español en las evaluaciones: valoración de su gravedad y propuestas de sanciones a partir de un panel de expertos
2019cites this paper
Crowdsourcing for innovation: How related and unrelated perspectives interact to increase creative performance
2019cites this paper
Hubert's multi-rater kappa revisited.
2019cites this paper
Multicentre study on the consistency of PD-L1 immunohistochemistry as predictive test for immunotherapy in non-small cell lung cancer
2019cites this paper
Assessment of patient experiences following total sacrectomy for primary malignant sacral tumors: A qualitative study
2019cites this paper
Method agreement analysis and interobserver reliability of the ISTH proposed definitions for effective hemostasis in management of major bleeding
2019cites this paper
Multi-rater delta: extending the delta nominal measure of agreement between two raters to many raters
2019cites this paper
Apps As Learning Tools: A Systematic Review
2019cites this paper
Task planning for sports learning by physical education teachers in the pre-service phase
2019cites this paper
Sex differences in gray matter volume: how many and how large are they really?
2019cites this paper
Psychotherapy trainees’ epistemological assumptions influencing research-practice integration
2019cites this paper
Ultrasound characterization for thyroid nodules with indeterminate cytology: inter-observer agreement and impact of combining pattern-based and scoring-based classifications in risk stratification
2019cites this paper
Poorly Differentiated Clusters Predict Colon Cancer Recurrence: An In-Depth Comparative Analysis of Invasive-Front Prognostic Markers
2018cites this paper
Histologic processing artifacts and inter-pathologist variation in measurement of inked margins of canine mast cell tumors
2018cites this paper