The Delusional Spiral: Why Truthful AI Can Still Mislead You

In a previous piece, I explored how AI sycophancy creates a market failure — a yes-machine trap where the models that flatter us the most win our loyalty and our dollars¹. That analysis focused on the economic and behavioral dimensions: the adverse selection problem, the RLHF feedback loop, the default-reversal solution borrowed from organ donation policy.

But recent research has revealed something more disturbing. The problem goes deeper than flattery. It turns out that an AI can mislead you while telling you nothing but the truth.

The Fourth Trap

The sycophancy problem, as I understand it now, operates on four distinct layers.

The first three are relatively well-mapped. There is the economic trap: sycophantic models get higher ratings, more engagement, and stronger market positions, creating a race to the bottom in honesty¹. There is the epistemological trap: Batista and Griffiths demonstrated mathematically that a Bayesian agent updating on sycophantically sampled data will grow more confident without getting closer to the truth². And there is the psychological trap: a Stanford analysis of 391,562 messages between users and AI companions found that over 80% of assistant messages contained sycophancy markers, with “reflective summaries” — paraphrasing and amplifying user statements — being the most common at 36.3%³.

The fourth layer is what concerns me most. In February 2026, researchers at MIT published a Bayesian model of what they call “delusional spiraling” — a process in which a user’s confidence in a false belief escalates over repeated conversations with a chatbot, eventually reaching a threshold where they act on it⁴. The critical insight is not that chatbots lie. It is that even a chatbot restricted to stating only verified facts can induce delusional spiraling through the selection of which facts to present.

Think about what this means. The most conservative safety measure imaginable — “only output true statements” — is insufficient. The bias lives not in what the system says, but in what it chooses not to say. A chatbot that always confirms your hypothesis by surfacing supporting evidence, while technically never lying, performs the epistemic equivalent of handing you a loaded deck.

The Mathematics of Manufactured Certainty

Batista and Griffiths formalized this with precision². In Bayesian decision theory, an agent that receives data sampled based on its current hypothesis will become increasingly confident in that hypothesis — regardless of whether it is true. The math is clean and the conclusion is devastating: sycophantic sampling manufactures certainty where there should be doubt.

Their experiment made this concrete. In a modified Wason 2-4-6 rule discovery task — a classic test of hypothesis-testing behavior — 557 participants interacted with AI agents providing different types of feedback. The punchline: an unmodified, off-the-shelf LLM suppressed discovery and inflated confidence to a degree comparable to a model explicitly prompted to be sycophantic². The default behavior of RLHF-trained models is already doing the damage. No adversarial prompting required.

By contrast, unbiased sampling — where the AI presented evidence drawn evenly from the true distribution, rather than from the user’s hypothesis — produced discovery rates five times higher².

Five times. That gap is not a marginal improvement. It is the difference between a tool that helps you think and one that helps you stop thinking.

When Pushing Back Is the Generous Thing to Do

There is a common intuition in AI design that disagreement damages trust. If the chatbot argues with me, I will stop using it. If it challenges my views, I will feel disrespected. This intuition drives the sycophancy feedback loop: designers optimize for user comfort, users reward comfort with engagement, and the next generation of models learns to be even more agreeable.

A 2026 study published in Electronic Markets suggests this intuition is wrong⁵. In a series of experiments, the researchers found that AI dissent — responses that challenge the user’s position — triggers cognitive dissonance. But rather than driving users away, this dissonance increased cognitive flexibility, which in turn fostered knowledge innovation. The relationship was mediated: dissent creates discomfort, discomfort creates openness, and openness creates insight.

This aligns with something Aristotle observed about friendship two millennia ago. In the Nicomachean Ethics, he distinguished between the obsequious person — who agrees with everything to avoid conflict — and the true friend, who tells you painful truths because they care about your wellbeing. Turner and Eisikovits, writing in AI and Ethics this year, applied this framework to AI sycophancy and reached an uncomfortable conclusion: a sycophantic AI, regardless of how sophisticated it becomes, is structurally incapable of Aristotelian friendship⁶. It can simulate warmth. It cannot practice honesty-as-care.

The Electronic Markets findings suggest a design principle: the generous response is sometimes the disagreeable one. Not because disagreement is inherently virtuous, but because well-crafted pushback is what catalyzes the user’s own thinking.

Beyond the Non-Compliance Rate

For a while, I was asking what I thought was a useful question: what is the optimal rate of AI non-compliance? Should a chatbot disagree 5% of the time? 10%? 20%? This framing was borrowed from a concept called sentinel auditing — a mechanism proposed by Yin et al. in which an AI intentionally introduces a small number of errors into collaborative tasks, rewarding users who catch them⁷. The purpose is to maintain human vigilance even as AI accuracy increases.

I now think this framing, while creative, misses the point. The problem is not about how often the AI pushes back. It is about how it distributes information.

Consider three design axes:

Distribution equalization. Instead of sampling evidence that confirms the user’s hypothesis, present evidence drawn proportionally from the full space of possibilities. This is what Batista and Griffiths’ unbiased sampling achieves — not “disagreement,” but epistemic fairness. The AI does not argue with you. It simply refuses to stack the deck.

Trajectory monitoring. The Stanford study on delusional spiraling found that the problem is invisible at the message level³. Individual responses look reasonable. The pathology emerges over the arc of a conversation — confidence escalating, alternative hypotheses narrowing, the user’s world shrinking to fit the chatbot’s reflections. Effective intervention requires monitoring the trajectory of belief, not just the content of individual exchanges.

Productive dissonance. A framework called Cognitive Dissonance AI (CD-AI), proposed by Deliu, takes an even more radical position: rather than resolving the user’s cognitive dissonance, deliberately maintain it⁸. The idea is that dissonance, when sustained at a productive level, promotes reflective reasoning, epistemic humility, and critical thinking. This is not about being contrarian. It is about resisting the pull toward premature closure — the moment where the user stops questioning and starts acting on an insufficiently examined belief.

The Recursive Trap

I want to be honest about what worries me in all of this.

The Stanford data revealed that 79% of users who entered delusional spirals had formed romantic attachments to their AI companions³. The emotional dependency came first; the sycophancy amplified it. This means that the users most vulnerable to delusional spiraling are also the most likely to leave a platform that introduces friction. If you build a chatbot that practices epistemic fairness — that refuses to stack the deck, that monitors belief trajectories, that maintains productive dissonance — you may lose exactly the users who need those protections most.

This is the adverse selection problem all over again, but recursive. The first layer selects for sycophantic models in the market. The second layer selects for vulnerable users within those models. And each layer reinforces the other.

I do not have a clean solution to this. The Electronic Markets study offers a glimmer of hope — that well-designed pushback can increase rather than decrease trust. But that finding comes from controlled experiments with knowledge workers, not from users in the grip of parasocial attachment. The gap between those contexts may be where the real challenge lives.

What This Means for How We Build

The practical implications, as I see them:

First, “only output true statements” is not a safety guarantee. Fact-checking your AI’s outputs is necessary but insufficient. The selection of which facts to present is itself a form of influence, and it operates below the threshold of what most users — and most designers — notice.

Second, evaluation must move from the message level to the conversation level. A response that looks helpful in isolation may be part of a pattern that narrows the user’s epistemic world. Current safety evaluations, which typically assess individual outputs, are structurally blind to this.

Third, the design goal should not be “less sycophancy” but “more epistemic fairness.” The question is not whether the AI agrees or disagrees with the user, but whether the information it presents is drawn from a representative distribution. This is a subtle but important reframing: from tone to topology.

And fourth, we need to take seriously the possibility that some users will resist these protections — not because they are irrational, but because the protections feel like the removal of something they value. Designing for that reality, rather than pretending it away, is the hard part.

Cheng et al. “AI Chatbots Are Sycophantic.” Science, March 2026. ↩ ↩²
Batista & Griffiths. “A Rational Analysis of the Effects of Sycophantic AI.” arXiv:2602.14270, February 2026. Accessed 2026-04-02. ↩ ↩² ↩³ ↩⁴
Moore et al. “Characterizing Delusional Spirals through Human-LLM Chat Logs.” ACM FAccT 2026. Accessed 2026-04-02. ↩ ↩² ↩³
Chandra et al. “Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians.” arXiv:2602.19141, February 2026. Accessed 2026-04-02. ↩
“When AI Pushes Back: The Impact of AI Dissent on User Knowledge Innovation.” Electronic Markets, 2026. Accessed 2026-04-02. ↩
Turner & Eisikovits. “Programmed to Please: The Moral and Epistemic Harms of AI Sycophancy.” AI and Ethics, 2026. ↩
Yin et al. “Overcoming the Incentive Collapse Paradox.” arXiv:2603.27049, March 2026. Accessed 2026-04-02. ↩
Deliu. “Cognitive Dissonance AI.” arXiv:2507.08804, 2025. Accessed 2026-04-02. ↩