Assessing Agreement on Classification Tasks: The Kappa Statistic

Published 1996 in International Conference on Computational Logic

ABSTRACT

Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several different statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same difficulties and come up with a solution in the kappa statistic. We discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.

PUBLICATION RECORD

Publication year
1996
Venue
International Conference on Computational Logic
Publication date
1996-02-27
Fields of study
Sociology, Linguistics, Computer Science
Identifiers
arXiv cmp-lg/9602004
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

An investigation into the correlation of cue phrases, unfilled pauses and the structuring of spoken discourse
1995cited by this paper
Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues
1993cited by this paper
TOBI: a standard for labeling English prosody
1992cited by this paper
The kappa statistic.
1992cited by this paper
Temporal Structure of Discourse
1992cited by this paper
Issues in the use of kappa.
1991cited by this paper
Conversational Games within Dialogue
1991cited by this paper
Disambiguating Cue Phrases in Text and Speech
1990cited by this paper
Cognition and Talk: The Relationship of Semantic Units to Temporal Patterns of Fluency in Spontaneous Speech
1986cited by this paper
Basic Content Analysis
1986cited by this paper
Attention, Intentions, and the Structure of Discourse
1986cited by this paper
Issues in the Use of Kappa to Estimate Reliability
1986cited by this paper
Content Analysis: An Introduction to Its Methodology
1980cited by this paper
Extension of the kappa coefficient.
1980cited by this paper

CITED BY

The psychological impact of major disasters on Japan's medical system: An SNS text analysis.
2026cites this paper
Fusion architectures for soft rot detection in melon plants using hyperspectral and multicolor fluorescence imaging
2026cites this paper
KG-Retailbot: A Knowledge Graph-Based Chatbot for Explaining Robotic Scenario Information in a Retail Setting
2026cites this paper
Modeling and prediction of thermal comfort in outpatient hospital settings based on multimodal field-measured data
2026cites this paper
Cognitive reserve effects on discourse production processing in healthy aging
2026cites this paper
Computational linguistics: a scientometric review
2025cites this paper
Simulating urban expansion dynamics in Tehran through satellite imagery and cellular automata Markov chain modelling
2025cites this paper
Designing a Model of Sustainable Education based on Artificial Intelligence in Higher Education
2025cites this paper
Historical culture and tourist perception: a social network analysis of spatial structure and auspicious symbolism in Minnan village
2025influential citation
Sediment origins in the Catamayo-Chira Transboundary Basin: impacts on Poechos Reservoir capacity under ENSO influence
2025influential citation
ChatGPT 3.5 Passes the Minimum Intelligence Signal Test (MIST). Should we care?
2025cites this paper
Crossmodal collostructional analysis of English [ADV and ADV] constructions: multimodal constructions or crossmodal collostructions?
2025influential citation
Context in abusive language detection: On the interdependence of context and annotation of user comments
2025cites this paper
Unveiling the dynamics of discourse production in healthy aging and its connection to cognitive skills
2025cites this paper
Assessing carbon stocks and their economic value of mangrove ecosystem in the Krishna Delta: A blue carbon ecosystem service modelling approach
2025cites this paper
Motor imagery EEG classification via wavelet-packet synthetic augmentation and entropy-based channel selection
2025cites this paper
Anonymity as a catalyst for good: Linking social media anonymity to prosocial behavior
2025cites this paper
Generative AI-Augmented Thematic Analysis
2025cites this paper
Intraoperative assessment of oral cancer margins with rapid fiberoptic Raman spectroscopy
2025cites this paper
Digital scenes beneficial to mental health: how idyllic life presented by social media works and induces travel intentions
2025cites this paper
How does language shape formation of concepts? Empirical investigation of generics and conditionals in French
2025cites this paper
Identification and mitigation of a systematic analysis error in a multicenter dual-energy x-ray absorptiometry study
2025influential citation
Improving ML Training Data with Gold-Standard Quality Metrics
2025cites this paper
Enhancing keyphrase extraction from academic articles using section structure information
2025influential citation
Unravelling the competing dynamics of Chinese causative markers shi 使, ling 令, jiao1 叫 and jiao2 教: A diachronic analysis
2025cites this paper
Automating Credit Card Limit Adjustments Using Machine Learning
2025cites this paper
Utilizing LLMs to Investigate the Disputed Role of Evidence in Electronic Cigarette Health Policy Formation in Australia and the UK
2025cites this paper
Playing dead in natural disasters: analysing customer reactions to the perceived silence of Starbucks on the earthquake disaster in Türkiye
2025cites this paper
Land-use legacy drives post-abandonment forest structure and understory in the western Alps
2025cites this paper
DPS: Design pattern summarisation using code features
2025cites this paper
A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
2025influential citation
CordelSextilha.BR: A Benchmark for Poetic Form in Brazilian Cordel Verse Generation
2025influential citation
Integrating the CA–Markov model and geospatial techniques for spatiotemporal prediction of land use/land cover dynamics in Qus District, Egypt
2025cites this paper
Past and Present in the Ecological Connectivity of Protected Areas Through Land Cover and Graph-Based Metrics
2025cites this paper
SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
2025cites this paper
Negotiating shared understanding: Coding repair in social interaction
2025cites this paper
Fine-grained Narrative Classification in Biased News Articles
2025cites this paper
Concordance in Basal Cell Carcinoma Diagnosis. Building a Proper Standard Reference to Train Artificial Intelligence Tools
2025cites this paper
ReviewGuard: Enhancing Deficient Peer Review Detection via LLM-Driven Data Augmentation
2025cites this paper
Visual representations of energy and chemical bonding in biology and chemistry textbooks: A case study of ATP hydrolysis
2025influential citation
Geoscience communication: a content analysis of practice in British Columbia, Canada, using science communication models
2025cites this paper
I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations
2025cites this paper
A Review of Deep Learning Models for Twitter Sentiment Analysis: Challenges and Opportunities
2024cites this paper
Construction and Analysis of the Text Information Extraction Corpus for Electronic Medical Records Specific to Coronary Heart Disease
2024cites this paper
The Impact of Metacognitive Strategy-Supported Intelligent Agents on the Quality of Collaborative Learning from the Perspective of the Community of Inquiry
2024influential citation
Study of urban warming in Chandigarh union territory, India: geospatial approach
2024cites this paper
Building Foundations for Inclusiveness through Expert-Annotated Data
2024cites this paper
Design of a model to reduce academic corruption in higher education
2024cites this paper
Short-Term Residential Load Forecasting via Pooling-Ensemble Model With Smoothing Clustering
2024cites this paper
Research on dialogue mode recognition in meeting scenes
2024cites this paper
The translation of food in Hong Lou Meng : a cross-linguistic study based on the English and French versions
2024cites this paper
Assessing Land-Cover Changes in the Natural Park ‘Fragas do Eume’ over the Last 25 Years: Insights from Remote Sensing and Machine Learning
2024cites this paper
A Dependency Treebank of Tweets in Brazilian Portuguese: Syntactic Annotation Issues and Approach
2024cites this paper
Assessing Inter-Annotator Agreement On Argumentative Markup
2024cites this paper
Classifiers of anterior cruciate ligament status in female and male adolescents using return‐to‐activity criteria
2024cites this paper
Risk Prediction Score for Thermal Mapping of Pharmaceutical Transport Routes in Brazil
2024cites this paper
An Application of Large Language Models to Coding Negotiation Transcripts
2024cites this paper
A Preliminary Survey of Semantic Descriptive Model for Images
2024cites this paper
People make mistakes: Obtaining accurate ground truth from continuous annotations of subjective constructs
2024cites this paper
Multidimensional feature analysis shows stratification in robotic-motor-training gains based on the level of pre-training motor impairment in stroke
2024cites this paper
Efficient argument classification with compact language models and ChatGPT-4 refinements
2024influential citation
A multi-level closing based segmentation framework for dermatoscopic images using ensemble deep network
2024cites this paper
Gender and Education: Their Role in the Zipfian Distribution of Speech Acts
2024cites this paper
Exploring idyllic healing: developing a multi-sensory image scale for idyllic life and its impact on restorative effects and travel intention
2024cites this paper
Language assessment in persons with aphasia early after thrombolysis: the utility of multilevel procedures of discourse analysis
2024influential citation
Universal Dependencies: Extensions for Modern and Historical German
2024cites this paper
Exploring the influence of regulated learning processes on learners’ prestige in project-based learning
2024cites this paper
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
2024cites this paper
MAMKit: A Comprehensive Multimodal Argument Mining Toolkit
2024cites this paper
Spatiotemporal dynamics of land use and land cover change around Volcanoes National Park and their implications for biodiversity conservation
2024cites this paper
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
2024influential citation
OralImmunoAnalyser: a software tool for immunohistochemical assessment of oral leukoplakia using image segmentation and classification models
2024cites this paper
The new millennium so far: analysing land cover change in Ogun State Nigeria
2024cites this paper
Concordance in basal cell carcinoma diagnosis. Building a proper ground truth to train Artificial Intelligence tools
2024cites this paper
A Constraint-Based Model of Mixed-Initiative Dialogue for Information-Seeking Interactions
2024cites this paper
Carbon storage and sequestration in a eucalyptus productive zone in the Brazilian Cerrado, using the Ca-Markov/Random Forest and InVEST models
2024cites this paper
Exploring the landscape pattern change analysis for the transboundary Nyungwe-Kibira Forest (2000–2019): a spatially explicit assessment
2024cites this paper
Prediction of Anti-rheumatoid Arthritis Natural Products of Xanthocerais Lignum Based on LC-MS and Artificial Intelligence
2024cites this paper
Automated Essay Scoring: Recent Successes and Future Directions
2024cites this paper
Identification of New Modulators and Inhibitors of Palmitoyl-Protein Thioesterase 1 for CLN1 Batten Disease and Cancer
2024cites this paper
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
2024cites this paper
Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning Classification
2023cites this paper
High-Throughput Phenotypic Screening and Machine Learning Methods Enabled the Selection of Broad-Spectrum Low-Toxicity Antitrypanosomatidic Agents
2023cites this paper
Assessing the situation in the face of crisis
2023influential citation
Unsupervised Candidate Answer Extraction through Differentiable Masker-Reconstructor Model
2023influential citation
Unravelling Indirect Answers to Wh-Questions: Corpus Construction, Analysis, and Generation
2023cites this paper
A motor imagery EEG classification algorithm based on ResCNN-BiGRU
2023cites this paper
A Chinese telemedicine-dialogue dataset annotated for named entities
2023cites this paper
Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach
2023cites this paper
Deep Learning-Based Detection and Classification of Uveal Melanoma Using Convolutional Neural Networks and SHAP Analysis
2023cites this paper
Automatic tool to annotate smile intensities in conversational face-to-face interactions
2023cites this paper
Phrasal Synchronization of Gesture With Prosody and Information Structure
2023cites this paper
The be‐ versus get‐passive alternation in world Englishes
2023cites this paper
Feature Reweighting for EEG-based Motor Imagery Classification
2023cites this paper
What counts as a multimodal metaphor and metonymy? Evolution of inter-rater reliability across rounds of annotation
2023cites this paper
Analyzing Dataset Annotation Quality Management in the Wild
2023cites this paper
Adapting Emotion Detection to Analyze Influence Campaigns on Social Media
2023cites this paper
Desafios e Tendências na Predição de Sepse
2023cites this paper
Research on Sentiment Classification of MOOC User Comments Based on Machine Learning
2023influential citation
Exploring the Impact of Code Clones on Deep Learning Software
2023cites this paper