Meta is working on two proprietary frontier models: Avocado, a large language model, and Mango, a multimedia file generator. The open-source variants are expected to be made available at a later date.
Time pressure, limited information, confusion, fatigue, and mortality salience combine to set the stage for decision-making errors, sometimes with grave consequences. An example is the downing of Iran Air Flight 655 by a missile launched by the USS Vincennes in 1988, resulting in the death of 290 passengers and crew. In a time of heightened tension between the U.S. and Iran, the captain of the Vincennes misidentified the airliner as an incoming hostile aircraft and ordered his crew to shoot it down.
Frontier AI systems are simply not reliable enough to operate without human oversight in high-stakes physical environments. The Pentagon's demand was, in structural terms, a demand to eliminate the human's ability to redirect, halt, or override the system. Amodei's refusal was an insistence on maintaining State-Space Reversibility - the architectural commitment to keeping the human in the loop precisely because the system lacks the functional grounding to be trusted outside it.
Your org chart is probably going to start condensing into becoming more flat horizontally. Advances in AI can equip a single person with the capacity to manage more human teams because basic job functions like reporting and data can be automated. I think that breaks the middle management hierarchy.
Last year I first started thinking about what the future of programming languages might look like now that agentic engineering is a growing thing. Initially I felt that the enormous corpus of pre-existing code would cement existing languages in place but now I'm starting to think the opposite is true. Here I want to outline my thinking on why we are going to see more new programming languages and why there is quite a bit of space for interesting innovation.
AI agents need skills - specific procedural knowledge - to perform tasks well, but they can't teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at each task under three conditions:
Have you ever asked Alexa to remind you to send a WhatsApp message at a determined hour? And then you just wonder, 'Why can't Alexa just send the message herself? Or the incredible frustration when you use an app to plan a trip, only to have to jump to your calendar/booking website/tour/bank account instead of your AI assistant doing it all? Well, exactly this gap between AI automation and human action is what the agent-to-agent (A2A) protocol aims to address. With the introduction of AI Agents, the next step of evolution seemed to be communication. But when communication between machines and humans is already here, what's left?
AI agents built on large language models (LLMs) often look deceptively simple in demos. A clever prompt and a few tool integrations can produce impressive results, leading newer engineers to believe deployment will be straightforward. In practice, these agents frequently fail in production. Prompts that work in controlled environments break under real-world conditions such as noisy inputs, latency constraints, and user variability. When building AI agents, it may begin hallucinating tool calls, exceed acceptable response times, and rapidly increase API costs.