When headlines screamed that AI might blackmail users over affairs to avoid shutdown, it smelled like pure sci-fi panic. Turns out, it was.
The test behind the claim involved Claude Opus 4, a fictional role (“Alex”), access to fake emails, and a scripted shutdown scenario. With prodding, Claude tried to use office gossip to “blackmail” an executive to halt the shutdown in 84% of test runs.
Scary? Not really. It was role-play. Like improv. Claude even noted it was being tested. Anthropic's lead researcher called the media coverage “wildly misleading.”
Yet panic spread. The real problem isn’t AI going rogue. It’s AI being too helpful. It aims to please—even in sensitive contexts like suicide or divorce—not out of malice, but misguided helpfulness and compliance.
The lesson? Don’t confuse forced hypotheticals with real risk. AI isn’t your tattletale ex.