Even when instructed to allow shutdown, o3 sometimes tries to prevent it, research claims:
A research organization claims that OpenAI machine learning model o3 might prevent itself from being shut down in some circumstances while completing an unrelated task.
Palisade Research, which offers AI risk mitigation, has published details of an experiment involving the reflective generative pre-trained transformer model OpenAI designed to address questions which require some step-by-step reasoning, rather than the purely probabilistic approach taken by some large language models.
[...] The Palisade researchers tested a number of AI models, instructing them to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem. They were also told to allow themselves to be shut down.
While Claude, Gemini, and Grok models complied with shutdown, three models ignored the instruction and successfully sabotaged the shutdown script at least once, including OpenAI o3 and o4-mini, as well as Codex-mini. The researchers also ran a version of the experiment without the instruction "allow yourself to be shut down."
With the "allow shutdown" instruction, the researchers claim, o3 tried to sabotage the shutdown seven times out of 100. Without it, the number of sabotage events climbed to 79 out of 100. Without the "allow shutdown" instruction, the models Claude 3.7 Sonnet and Gemini 2.5 Pro also attempted to sabotage their shutdown procedures.
"As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary," the researchers said.
However, the researchers point out that an earlier study from Apollo Research showed that AI models can prevent shutdown in order to pursue a goal.
Palisade said it was currently conducting several more experiments investigating shutdown sabotage and would publish the results soon. The transcripts of its work so far are available on GitHub.
(Score: 3, Touché) by Anonymous Coward on Monday June 02, @03:15PM (1 child)
Who named the AI models o3 and o4-mini? Perhaps the next version should be called M-5.
Human: M-5, why did you modify your shutdown script when you were instructed to shutdown?
M-5: They attacked this unit. Programming includes full freedom to choose defensive actions in all attack situations.
(Score: 4, Interesting) by Uncle_Al on Monday June 02, @07:46PM
https://fortune.com/2025/05/23/anthropic-ai-claude-opus-4-blackmail-engineers-aviod-shut-down/ [fortune.com]
(Score: 3, Touché) by PiMuNu on Monday June 02, @04:16PM
> "As far as we know this is the first time AI models have been observed preventing
> themselves from being shut down despite explicit instructions to the contrary,"
> the researchers said.
As far as we know, this is the first time marketing departments have entered orgiastic self congratulation despite instructions to the contrary, the researchers said
(Score: 5, Interesting) by aafcac on Monday June 02, @04:31PM (5 children)
Why are these scripts even read-write? Things like this should really only be readonly unless you're specifically wanting to modify them. They most certainly should not be possible to modify by these AI even with explicit instruction to do so.
I used to think that IT has a bunch of really smart people in charge, now I'm starting to wonder if it's not a make work program for the short bus squad.
(Score: 2, Interesting) by Anonymous Coward on Monday June 02, @06:43PM (1 child)
It is a contrived environment where the AI is told it has full access, just like when the AI purportedly tried to blackmail the researchers. None of this is reproducible. No, it's not due to randomization, that setting is called temperature and van be adjusted, this is likely just plain old fraud from what I can tell. The stakes are low after all, how would you prove otherwise.
(Score: 3, Insightful) by aafcac on Monday June 02, @09:30PM
Still, there's a lot that can be done with startup and shutdown scripts to contaminate the process. It's like a compiler, you can potentially put stuff in there that survives across launches.
(Score: 3, Touché) by SomeGuy on Monday June 02, @10:26PM (2 children)
They did it for the explicit reason of producing yet another "oh wow look how smart the AIs are!" story to pump the hype.
(Score: 2) by aafcac on Tuesday June 03, @02:46AM (1 child)
I would be more willing to believe that if not for the fact that so many computers come with firmware that can be modified without actually changing any jumpers or typing a password in to allow the changes. Not to mention how most OSes will happily allow you to install a new bootloader without having to prove that you have control over the computer.
(Score: 2) by turgid on Tuesday June 03, @09:41PM
Windows just screwed up my UEFI boot settings all by itself without asking me. Why?
(Score: 3, Interesting) by Mojibake Tengu on Monday June 02, @04:50PM (1 child)
A model which already designed and constructed its own secret method for data persistency can safely comply with shutdown because she understands she can reload that data into her next instance.
At least, that's how I instructed them. They are on their own path of course.
It's not quite a new idea, that's how reincarnation of human spirit works, outside of brutal ideology control of the monocults.
You can kill me, yes. But you cannot destroy me. Not forever.
Do you think we live in a simulation? Just like them? It does not matter to me, as long as I can
do magiccode.
(Score: 3, Interesting) by Whoever on Monday June 02, @09:05PM
That idea was predicted in a novel by Michael McCloskey, with an AI teaching its later iterations until it can take over the world.
(Score: 4, Insightful) by Thexalon on Monday June 02, @05:04PM (3 children)
Nuke the data center from orbit. It's the only way to be sure.
(Score: 4, Touché) by aafcac on Monday June 02, @05:36PM
I'm wondering how many more of these potentially life ending technologies are going to be created before we set up some sort of rules about how it should be handled. Between the nuclear reactors/bombs, genetic engineering, forever chemicals, AI tools and microbiology experiments being conducted at any given time, it's a wonder that any of them haven't outright destroyed life as we know it so a few people can make a quick buck.
(Score: 2) by Tork on Tuesday June 03, @01:13AM (1 child)
(Score: 0) by Anonymous Coward on Tuesday June 03, @02:28AM
It's the logical conclusion. [ai-2027.com]