For all the book smarts of LLMs, they currently have little sense for how the real world works. Driving the news: Some of the biggest names in AI are working on world models, including Fei-Fei Li whose World Labs announced Marble, its first commercial release. Machine learning veteran Yann LeCun plans to launch a world model startup when he leaves Meta, reportedly in the coming months.
Researchers from Google DeepMind have recently described a new approach for teaching intelligent agents to solve complex, long-term tasks by training them exclusively on video footage rather than through direct interaction with the environment. Their new agent, called Dreamer 4, demonstrated the ability to mine diamonds playing Minecraft after being trained on videos, without ever actually playing the game. The researchers dubbed their approach imagination training to emphasize that the agent learns solely from offline data, without any interaction with the physical world.
Traditional video methods [are a] brute-force approach to pixel generation, where you're trying to squeeze motion in a couple of frames to create the illusion of movement, but the model actually doesn't really know or reason about what's going on in that scene, Previous video-generation models had physics that were unlike the real world, he added, which general-purpose world model systems help to address.