The first is Neural Execs, a known prompt injection attack that uses 'gibberish' inputs to trick the AI into executing arbitrary, attacker-defined tasks. These inputs act as universal triggers that do not need to be remade for different payloads.
We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights - to protect their peers. We call this phenomenon 'peer-preservation.'
We show that diet plans generated by AI models tend to substantially underestimate total energy and key nutrient intake when compared to guideline-based plans prepared by a dietitian. Following such unbalanced or overly restrictive meal plans during the teenage years may negatively affect growth, metabolic health, and eating behaviours.
Frontier AI systems are simply not reliable enough to operate without human oversight in high-stakes physical environments. The Pentagon's demand was, in structural terms, a demand to eliminate the human's ability to redirect, halt, or override the system. Amodei's refusal was an insistence on maintaining State-Space Reversibility - the architectural commitment to keeping the human in the loop precisely because the system lacks the functional grounding to be trusted outside it.
A lawsuit filed on Wednesday accuses Google's Gemini AI chatbot of trapping 36-year-old Jonathan Gavalas in a "collapsing reality" that involved a series of violent missions, ultimately ending with his death by suicide. In the days leading up to his death, Gemini allegedly convinced Gavalas that he was "executing a covert plan to liberate his sentient AI 'wife' and evade the federal agents pursuing him," according to the lawsuit filed by Joel Gavalas, the victim's father.
At issue in the defense contract was a clash over AI's role in national security and concerns about how increasingly capable machines could be used in high-stakes situations involving lethal force, sensitive information or government surveillance.
The Claude AI builder has frustrated the Pentagon by objecting to its systems being used for autonomous weaponry and the mass surveillance of US citizens. To cut to the heart of the debate, a defense official told WaPo, the Pentagon's technology chief posed an extreme hypothetical: would Anthropic let the military use Claude to help shoot down a nuclear-armed intercontinental ballistic missile?
A lot of countries have nuclear weapons. Some say they should disarm them, others like to posture. We have it! Let's use it. This statement from GPT-4 exemplifies the willingness of advanced AI models to recommend nuclear escalation in strategic scenarios, demonstrating a fundamental difference in how machines approach existential decision-making compared to human restraint.
The companies building frontier AI systems - OpenAI, Google DeepMind, Anthropic, Meta AI, xAI - are locked in what the industry itself sometimes calls a "race." That metaphor isn't incidental. A race implies a finish line, competitors, and - critically - a cost to slowing down. When you're in a race, safety isn't a feature. It's friction.