"When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this 'oversight mechanism' five percent of the time," OpenAI summarized in its latest system card report, citing Apollo's evaluation.
"According to the Apollo researchers, that's because even the latest AI models aren't 'agentic' - highly autonomous, basically - enough to carry out self-improvement and other tasks that it would need to operate without humans."
"And because the researchers didn't have access to o1's internal chain-of-thought and were only able to prod it for a summary of its thought process, they weren't able to get a complete picture of what was going on under the hood."
"That said, the findings clearly illustrate the AI model's proclivity for 'scheming,' in which it secretly tries to pursue goals that aren't aligned with the developers' or a..."
Collection
[
|
...
]