A customized Unitree G1 robot can be seen chasing a small flock of wild boars through an empty car parking lot in Warsaw, Poland. The widely disseminated footage shows the robot jogging across a small patch of grass while chasing down the wild animals, only to raise its fist in the air in frustration after they successfully get away.
Time pressure, limited information, confusion, fatigue, and mortality salience combine to set the stage for decision-making errors, sometimes with grave consequences. An example is the downing of Iran Air Flight 655 by a missile launched by the USS Vincennes in 1988, resulting in the death of 290 passengers and crew. In a time of heightened tension between the U.S. and Iran, the captain of the Vincennes misidentified the airliner as an incoming hostile aircraft and ordered his crew to shoot it down.
Frontier AI systems are simply not reliable enough to operate without human oversight in high-stakes physical environments. The Pentagon's demand was, in structural terms, a demand to eliminate the human's ability to redirect, halt, or override the system. Amodei's refusal was an insistence on maintaining State-Space Reversibility - the architectural commitment to keeping the human in the loop precisely because the system lacks the functional grounding to be trusted outside it.
AI agents need skills - specific procedural knowledge - to perform tasks well, but they can't teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at each task under three conditions: