Theory of Mind in Modern Large Language Models (LLMs)

Greg Robison
17 min read6 days ago

--

I wanted to provide another example of OpenAI’s Deep Research platform, this time not to analyze the themes of season 1 and 2 of the TV show Severance, here’s a more academic-ish survey report example on whether today’s LLM have a real Theory of Mind (I say no).

Theory of Mind involves taking another’s perspective and understanding they have desires and beliefs that may different from yours.

Here the unedited research report for you to judge how well this platform performs today:

Introduction

Theory of Mind (ToM) refers to the ability to attribute mental states (beliefs, intents, knowledge, emotions) to others and to understand that others can have perspectives different from one’s own (In Theory of Mind Tests, AI Beats Humans — IEEE Spectrum). It is a hallmark of human social cognition, typically emerging in children by age 4–5 in classic false-belief tests ([2302.08399] Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks) ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks). With the rise of powerful large language models like GPT-4, Anthropic’s Claude, and Google’s Gemini, researchers have begun asking whether these AI systems exhibit ToM-like characteristics or are merely mimicking such reasoning. Recent studies from both cognitive science and AI perspectives have tested LLMs on tasks inspired by developmental psychology (e.g. false-belief tasks, “Sally-Anne” tests, irony and faux pas detection) and on new benchmarks designed for AI. Below, we summarize key findings from the last few years, highlighting both evidence of ToM-like performance in LLMs and the debates over how to interpret these results.

Evidence of ToM-Like Abilities in Advanced LLMs

Several research teams have reported that the latest LLMs can produce behavior consistent with theory-of-mind reasoning. Notably, as model size and training sophistication have increased, performance on ToM tasks has improved dramatically:

These findings collectively indicate that cutting-edge LLMs (GPT-4 in particular) can pass many traditional ToM assessments that were originally designed to probe human social intelligence. In some studies, GPT-4’s ToM-like performance is comparable to a child of 7–10 years old or even adult-level on certain tasks ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks) (A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks). Such results have been described as “unexpected and surprising” by cognitive scientists, given that LLMs are just text-trained networks with no explicit social or visual experience (In Theory of Mind Tests, AI Beats Humans — IEEE Spectrum). Nevertheless, they show that LLMs can simulate the ability to reason about others’ minds to a remarkable degree.

Limitations and Debates: Do LLMs Really Understand Minds?

Despite impressive benchmarks, there is active debate about whether LLMs truly possess anything like human theory of mind or are relying on superficial cues. Cognitive scientists and AI researchers have identified several caveats and failure cases:

In summary, while LLMs like GPT-4 have demonstrated outputs consistent with theory-of-mind reasoning, skeptics argue this may reflect simulation rather than genuine understanding. Even strong proponents of LLM capabilities acknowledge that current models can falter on simple variants of tasks and lack the consistency and transparency we expect from a true theory-of-mind reasoning system (Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models — ACL Anthology) (). The evidence is mixed, and whether LLMs “have” ToM or are just faking it remains a subject of spirited debate.

Cognitive Science Perspectives and Evaluation Approaches

Cross-pollination between cognitive science and AI has been central to these investigations. Researchers are explicitly adapting classic psychological paradigms to probe LLMs, and conversely using LLM results to reflect on human cognition:

In bridging cognitive science and AI, researchers are effectively conducting “psychological experiments” on AI systems, a paradigm sometimes called “machine psychology” (Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7–10 on Advanced Tests). This interdisciplinary approach has benefits both ways: it provides tools to dissect AI reasoning and also offers new theoretical insights (and questions) about the nature of ToM. For example, if an LLM can pass a false-belief task without having a body or eyes, what does that say about the minimal requirements for ToM? Is language alone sufficient to develop a form of mentalizing? Such questions were previously purely philosophical, but now we have empirical data from machines to inform the discussion.

Conclusions and Outlook

Do modern LLMs have a Theory of Mind? The consensus so far is nuanced. Behaviorally, the best models today (GPT-4 and peers) can simulate ToM-like reasoning to a remarkable extent — achieving parity with human children or even adults on several standard tests ([2302.02083] Evaluating Large Language Models in Theory of Mind Tasks) (In Theory of Mind Tests, AI Beats Humans — IEEE Spectrum). This suggests that LLMs have absorbed patterns of human mental state reasoning from their training data, enabling them to anticipate and infer beliefs and intentions in text scenarios. From a purely functional perspective, one might say these models “exhibit ToM abilities” in that their outputs on ToM tasks are often indistinguishable from those of humans (In Theory of Mind Tests, AI Beats Humans — IEEE Spectrum).

However, experts caution that this does not prove the models genuinely possess a human-like theory of mind. The counterarguments highlight that current LLMs sometimes rely on superficial shortcuts, struggle with novel or perturbed problems, and lack the consistent, built-in understanding that humans acquire through life-long social experience (Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models — ACL Anthology) (). In essence, they imitate ToM but may not truly innate ToM. As Melanie Mitchell and David Krakauer noted in a recent PNAS commentary, today’s LLMs should perhaps be seen as “models of formal linguistic skills” (including some social reasoning patterns) rather than veridical models of human understanding (Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?). The line between clever mimicry and real cognition is still hotly contested.

From a practical standpoint, continuing research is likely to further close the gap. As LLMs get larger, are trained on more diverse interactions, and are augmented with reasoning modules, their ToM-task performance may keep improving (A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks) (A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks). Indeed, studies show that prompt techniques and fine-tuning already boost ToM performance markedly (A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks). It is conceivable that next-generation models (e.g. Google’s Gemini or future GPT iterations) will overcome some of the current weaknesses on ToM benchmarks. If an AI one day consistently passes all behavioral tests of Theory of Mind that a human can, we will face a deeper philosophical question: Does it “have” ToM, or is it just a very well-trained mimic? At that point, we may need new definitions or tests, since, as one researcher put it, “if an imitation is as good as the real thing, how do you know it’s not the real thing?” (In Theory of Mind Tests, AI Beats Humans — IEEE Spectrum).

Finally, both opportunities and risks accompany advanced ToM in AI. A system that can infer human mental states could be extremely useful (e.g. for better virtual assistants or in teaching social skills to autistic children), but it also raises concerns. Experts have pointed out potential misuses, such as AI that can predict a user’s beliefs and desires could manipulate or deceive more effectively (A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks). There are also privacy implications if models can infer things you have not explicitly told them. As LLMs inch toward human-like social reasoning, cognitive scientists, AI engineers, and ethicists will need to collaborate to ensure we understand these models’ capabilities and limitations. The recent literature makes it clear that LLMs have begun to crack the door open on Theory of Mind, but whether they are truly entering the realm of understanding — or just holding up a mirror to the vast human text they’ve read — remains an open and fascinating question.

References (selected)

Sign up to discover human stories that deepen your understanding of the world.

--

--

Greg Robison
Greg Robison

Written by Greg Robison

With a Ph.D. in cognitive development and background in neuroscience, I bring a human-centric view to AI, whether theory, tools, or implications.

No responses yet

Write a response