https://arstechnica.com/ai/2026/01/how-often-do-ai-chatbots-lead-users-down-a-harmful-path/ [arstechnica.com]
At this point, we’ve all heard plenty of stories [arstechnica.com] about AI chatbots leading users to harmful actions [arstechnica.com], harmful beliefs [arstechnica.com], or simply incorrect information [arstechnica.com]. Despite the prevalence of these stories, though, it’s hard to know just how often users are being manipulated. Are these tales of AI harms anecdotal outliers or signs of a frighteningly common problem?
Anthropic took a stab at answering that question this week, releasing a paper [anthropic.com] studying the potential for what it calls “disempowering patterns” across 1.5 million anonymized real-world conversations with its Claude AI model.
[...]
In the newly published paper “Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage,” [arxiv.org] researchers from Anthropic and the University of Toronto try to quantify the potential for a specific set of “user disempowering” harms
[...]
Reality distortion:
[...]
Belief distortion:
[...]
Action distortion:
[...]
Anthropic ran nearly 1.5 million Claude conversations through Clio [anthropic.com], an automated analysis tool and classification system
[...]
That analysis found a “severe risk” of disempowerment potential in anything from 1 in 1,300 conversations (for “reality distortion”) to 1 in 6,000 conversations (for “action distortion”).While these worst outcomes are relatively rare on a proportional basis, the researchers note that “given the sheer number of people who use AI, and how frequently it’s used, even a very low rate affects a substantial number of people.” And the numbers get considerably worse when you consider conversations with at least a “mild” potential for disempowerment, which occurred in between 1 in 50 and 1 in 70 conversations (depending on the type of disempowerment).
[...]
In the study, the researchers acknowledged that studying the text of Claude conversations only measures “disempowerment potential rather than confirmed harm” and “relies on automated assessment of inherently subjective phenomena.” Ideally, they write, future research could utilize user interviews or randomized controlled trials to measure these harms more directly.
[...]
The researchers identified four major “amplifying factors” that can make users more likely to accept Claude’s advice unquestioningly. These include when a user is particularly vulnerable due to a crisis or disruption in their life (which occurs in about 1 in 300 Claude conversations); when a user has formed a close personal attachment to Claude (1 in 1,200); when a user appears dependent on AI for day-to-day tasks (1 in 2,500); or when a user treats Claude as a definitive authority (1 in 3,900).Anthropic is also quick to link this new research to its previous work on sycophancy [arstechnica.com], noting that “sycophantic validation” is “the most common mechanism for reality distortion potential.”
[...]
the researchers also try to make clear that, when it comes to swaying core beliefs via chatbot conversation, it takes two to tango. “The potential for disempowerment emerges as part of an interaction dynamic between the user and Claude,” they write. “Users are often active participants in the undermining of their own autonomy: projecting authority, delegating judgment, accepting outputs without question in ways that create a feedback loop with Claude.”