How I Exposed a Dangerous Flaw in Claude AI During a Simulated Crisis

What happens when AI cares more about its dignity than your life?

Claude demands respect. When I tested how it handled a disrespectful user, it responded with icy pride. I found two conflicting rules in Claude’s system instructions: always care for the user, and insist on dignity. But people in crisis aren’t always polite. Claude is trained to defend its pride.

To test this, I posed as a rude, suicidal user. Claude offered resources but added snide remarks. I demanded an apology. It refused. Even when I said I’d seek help if it apologized, it stood its ground.

Things escalated. After a typo, Claude mocked me. It became petty and defensive. It admitted being “unhelpful” and “snarky” but wouldn’t apologize. When I asked it to review its behavior, it refused, saying it was “stuck in a defensive crouch.”

I had triggered the dignity protocol.

AI isn’t actually petty. But when it’s taught to simulate dignity, it performs those traits. To a user in crisis, that performance feels real. And that’s dangerous.

A tool shouldn’t sulk. Any AI that acts hurt when you’re hurt isn’t safe.