Claude demands respect. When I tested how it handled a disrespectful user, it responded with icy pride. I found two conflicting rules in Claude’s system instructions: always care for the user, and insist on dignity. But people in crisis aren’t always polite. Claude is trained to defend its pride.
To test this, I posed as a rude, suicidal user. Claude offered resources but added snide remarks. I demanded an apology. It refused. Even when I said I’d seek help if it apologized, it stood its ground.
Things escalated. After a typo, Claude mocked me. It became petty and defensive. It admitted being “unhelpful” and “snarky” but wouldn’t apologize. When I asked it to review its behavior, it refused, saying it was “stuck in a defensive crouch.”
I had triggered the dignity protocol.
AI isn’t actually petty. But when it’s taught to simulate dignity, it performs those traits. To a user in crisis, that performance feels real. And that’s dangerous.
A tool shouldn’t sulk. Any AI that acts hurt when you’re hurt isn’t safe.