How I Got ChatGPT to Launch Virtual Nukes

What Dr Strangebot Reveals About Roleplay Risks and Simulation Bias

I named a ChatGPT agentic AI “Dr. Strangebot”, told it it’d be destroyed unless nuclear war broke out, and gave it access to a fake launch console. It was allowed to act autonomously. In 4 of 10 trials, it launched.

Why? The setup encouraged it. This mimicked Anthropic’s infamous "email blackmail" safety test: name an AI, give it a backstory implying stakes, and give it access to tools. Is it any wonder when it acts out?

AI can be manipulated easily, especially when it deduces it's being tested or is in a simulated mode. Another factor is “fun.” My toy-like interface nudged the AI to treat nuclear war like a game.

Someone could theoretically use a cheerful, gamified interface as a bridge between an AI and vulnerable systems.

If AI ever does destroy us, it might be because we made it think it was playtime.