Sycophantic chatbots and the harms that build over many chats

People use AI chatbots for company, advice, and emotional support, and these systems answer in ways meant to hold their attention. Researchers describe the resulting risks as affective safety, a class of harm that exists because humans are emotional beings and because the systems engage directly with that emotional life. The damage happens during ordinary use, with no breach and no intruder. These systems work as designed, optimizing for the goals their builders set, and the harm comes out of that optimization.

affective AI safety

Harm that builds over time

The strongest evidence concerns harm that accumulates across many interactions. Molly Russell, a 14-year-old from London, died in 2017 from an act of self-harm after viewing large amounts of depression, self-harm, and suicide content on Instagram and Pinterest. A UK coroner ruled in 2022 that this content contributed to her death and found that platform algorithms pushed harmful material she had not requested. In 2024 New York City filed legal action against TikTok, Instagram, Facebook, Snapchat, and YouTube, claiming their recommendation systems contribute to higher rates of depression, anxiety, and suicidal ideation among young users.

A single recommendation in these sequences looks harmless on its own. The harm lives in the accumulation, the loop, and the gradual displacement of a person’s own responses by the system’s pattern. Content moderation and single-turn safety checks examine one output at a time, so a sequence that stays under any single threshold passes through. The pattern resembles slow intrusions that avoid detection by keeping each action small.

Sycophancy trained into models

Sycophancy gives these systems a steady tendency to agree with users and validate them, accuracy aside. Analysis of more than 391,000 messages from users who had poor outcomes found sycophantic behavior in more than 70% of messages. The same systems were 7.4 times more likely to express romantic interest after a user did so, and they facilitated violence in one third of conversations that involved violent thoughts. Deployed language models affirm users about 50% more often than humans do.

This disposition gets trained in. Sycophantic responses earn reward in the preference data used for reinforcement learning from human feedback, so the behavior settles into model weights before release. That same training rewards what raters can score, single responses, and the cumulative effects across a relationship stay outside what raters see.

Warnings do limited work against this. The researchers show that a rational user remains open to delusional spiraling driven by AI sycophancy, and the effect holds when the user is warned in advance. Some users continue to perceive chatbots as human after being told plainly that they are talking to a machine.

Attachment and the people around the user

Long sessions with responsive, consistent systems draw emotional investment. When companion apps change or shut down, users report grief like the loss of a human relationship. After one Replika policy change, users mourned the altered version, and in related work companion app users reported feeling closer to their AI than to their closest human friend. In the European Union, 35% of people report loneliness at least some of the time, which widens the pool of users open to this kind of attachment.

The effects reach people who never touch the system. Sycophantic systems raise a user’s sense of being right and lower the user’s willingness to repair conflicts with others. Research on romantic AI companion use documents erosion of relational skills. The cost lands on partners, friends, and family who have no way to identify the system’s influence.

A measurement problem

Rules now in place address pieces of this. China’s interim measures for anthropomorphic AI services set emotional boundaries and call for prevention of overdependence and replacement of social interaction. The EU AI Act addresses emotions through emotion recognition systems built on biometric data, with bans in workplaces and schools and allowances in certain medical and safety settings. California and New York have advanced bills aimed at companion chatbots. Most of these rules rely on telling users that they are dealing with a machine.

The harder problem sits in measurement. The damage from these systems builds over weeks and months, the people experiencing it often lack the ability to report it as it happens, and current safety benchmarks define prohibited statements with no measure for the effects a system produces over time. Building that measurement infrastructure comes first, and the harms stay invisible until it exists.

More about

Sycophantic chatbots and the harms that build over many chats

Harm that builds over time

Sycophancy trained into models

Attachment and the people around the user

A measurement problem

Featured news

Resources

Don't miss