Comparing Humans and Large Language Models on an Experimental Protocol Inventory for Theory of Mind Evaluation (EPITOME)

Cameron R. Jones,Sean Trott,Benjamin K. Bergen

Published 2024 in Transactions of the Association for Computational Linguistics

ABSTRACT

Abstract We address a growing debate about the extent to which large language models (LLMs) produce behavior consistent with Theory of Mind (ToM) in humans. We present EPITOME: a battery of six experiments that tap diverse ToM capacities, including belief attribution, emotional inference, and pragmatic reasoning. We elicit a performance baseline from human participants for each task. We use the dataset to ask whether distributional linguistic information learned by LLMs is sufficient to explain ToM in humans. We compare performance of five LLMs to a baseline of responses from human comprehenders. Results are mixed. LLMs display considerable sensitivity to mental states and match human performance in several tasks. Yet, they commit systematic errors in others, especially those requiring pragmatic reasoning on the basis of mental state information. Such uneven performance indicates that human-level ToM may require resources beyond distributional information.

PUBLICATION RECORD

Publication year
2024
Venue
Transactions of the Association for Computational Linguistics
Publication date
2024-06-01
Fields of study
Computer Science, Linguistics, Psychology
Identifiers
DOI 10.1162/tacl_a_00674
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods
2023cited by this paper
Scalar Implicature is Sensitive to Contextual Alternatives
2023cited by this paper
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
2023cited by this paper
Theory of Mind May Have Spontaneously Emerged in Large Language Models
2023cited by this paper
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks
2023influential reference
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
2023cited by this paper
Understanding Social Reasoning in Language Models with Language Models
2023cited by this paper
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models
2023cited by this paper
Language Model Behavior: A Comprehensive Survey
2023cited by this paper
Boosting Theory-of-Mind Performance in Large Language Models via Prompting
2023cited by this paper
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
2022cited by this paper
A fine-grained comparison of pragmatic language understanding in humans and language models
2022cited by this paper
The debate over understanding in AI’s large language models
2022cited by this paper
Do Large Language Models know what humans know?
2022cited by this paper
What Artificial Neural Networks Can Tell Us About Human Language Acquisition
2022cited by this paper
Do Large Language Models Understand Us?
2022cited by this paper
Theory of Mind and Preference Learning at the Interface of Cognitive Science, Neuroscience, and AI: A Review
2022cited by this paper
Distrubutional Semantics Still Can't Account for Affordances
2022cited by this paper
Training language models to follow instructions with human feedback
2022cited by this paper
The Singleton Fallacy: Why Current Critiques of Language Models Miss the Point
2021cited by this paper
IoT-Enabled Social Relationships Meet Artificial Social Intelligence
2021cited by this paper
Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
2021cited by this paper
AI and the Everything in the Whole Wide World Benchmark
2021cited by this paper
So Cloze Yet So Far: N400 Amplitude Is Better Predicted by Distributional Information Than Human Predictability Judgements
2021cited by this paper
Few-shot Language Coordination by Modeling Theory of Mind
2021cited by this paper
Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right
2021cited by this paper
Systematic Review and Inventory of Theory of Mind Measures for Young Children
2020cited by this paper
Scaling Laws for Neural Language Models
2020cited by this paper
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
The neural architecture of language: Integrative modeling converges on predictive processing
2020cited by this paper
When Do Comprehenders Mentalize for Pragmatic Inference?
2020influential reference
Theory of mind in animals: Current and future directions.
2019cited by this paper
Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good
2019cited by this paper
Probing Neural Network Comprehension of Natural Language Arguments
2019cited by this paper
Revisiting the Evaluation of Theory of Mind through Question Answering
2019cited by this paper
Empirical Failures of the Claim That Autistic People Lack a Theory of Mind
2019cited by this paper
Machine Theory of Mind
2018cited by this paper
Evaluating Theory of Mind in Question Answering
2018cited by this paper
Individual Differences in Mentalizing Capacity Predict Indirect Request Comprehension
2018cited by this paper
Reliability and validity of advanced theory‐of‐mind measures in middle childhood and adolescence
2017cited by this paper
Eye tracking uncovered great apes' ability to anticipate that other individuals will act according to false beliefs
2017cited by this paper
Great apes anticipate that other individuals will act according to false beliefs
2016cited by this paper
There Is No Special Problem of Mindreading in Nonhuman Animals
2015cited by this paper
The ease and extent of recursive mindreading, across implicit and explicit tasks
2015cited by this paper
Submentalizing: I Am Not Really Reading Your Mind
2014cited by this paper
The Role of Language in Theory of Mind Development.
2014cited by this paper
Using Fiction to Assess Mental State Understanding: A New Task for Assessing Theory of Mind in Adults
2013cited by this paper
A Dispositional Approach to Attitudes: Thinking Outside of the Belief Box
2013cited by this paper
Knowledge and implicature: Modeling language understanding as social cognition
2012cited by this paper
Mechanisms of social cognition.
2012cited by this paper
What is “theory of mind”? Concepts, cognitive processes and individual differences
2012cited by this paper
Growing up blind does not change the neural bases of Theory of Mind
2009cited by this paper
On the lack of evidence that non-human animals possess anything remotely resembling a ‘theory of mind’
2007cited by this paper
Joint action: bodies and minds moving together.
2006cited by this paper
Origins of individual differences in theory of mind: from nature to nurture?
2005cited by this paper
Conversation, Pretense, and Theory of Mind.
2005cited by this paper
Understanding and sharing intentions: The origins of cultural cognition
2005cited by this paper
The influence of language on theory of mind: a training study.
2003cited by this paper
Pragmatics, modularity and mindreading
2002cited by this paper
Meta-analysis of theory-of-mind development: the truth about false belief.
2001cited by this paper
Why talk about mental states? The significance of children's conversations with friends, siblings, and mothers.
1996cited by this paper
An advanced test of theory of mind: Understanding of story characters' thoughts and feelings by able autistic, mentally handicapped, and normal children and adults
1994influential reference
Three-year-olds' difficulty with false belief: The case for a conceptual deficit
1987cited by this paper
Beliefs about beliefs: representation and constraining function of wrong beliefs in young children's understanding of deception.
1983cited by this paper
Under review
1981cited by this paper
Minds, brains, and programs
1980cited by this paper
Does the chimpanzee have a theory of mind?
1978cited by this paper
Beliefs about beliefs [P&W, SR&B]
1978cited by this paper
Logic and conversation
1975cited by this paper
Distributional Structure
1954cited by this paper
PSYCHOLOGICAL SCIENCE Research Article Attribution of Beliefs by 13-Month-Old Infants
year unknowncited by this paper

CITED BY

Traces of Social Competence in Large Language Models
2026cites this paper
Lies, damned lies, and language statistics: a comprehensive review of risks from manipulation, persuasion, and deception with large language models
2026cites this paper
On Emergent Social World Models -- Evidence for Functional Integration of Theory of Mind and Pragmatic Reasoning in Language Models
2026influential citation
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
2026cites this paper
Artificial Intelligence and the Illusion of Understanding: A Systematic Review of Theory of Mind and Large Language Models
2025influential citation
XToM: Exploring the Multilingual Theory of Mind for Large Language Models
2025cites this paper
Computational Models of Cognitive and Affective Theory of Mind
2025cites this paper
DivEye at PAN 2025: Diversity Boosts AI-Generated Text Detection
2025cites this paper
Diversity Boosts AI-Generated Text Detection
2025cites this paper
Re-evaluating Theory of Mind evaluation in large language models
2025cites this paper
Enhancing Agile Requirements Change Management: Integrating LLMs with Fuzzy Best-Worst Method for Decision Support
2025cites this paper
A Review of Incorporating Psychological Theories in LLMs
2025cites this paper
Does reading words help you to read minds? A comparison of humans and LLMs at a recursive mindreading task
2024cites this paper
On the Role of Linguistic Features in LLM Performance on Theory of Mind Tasks
year unknowncites this paper