World models could unlock the next revolution in artificial intelligence
Briefly

World models could unlock the next revolution in artificial intelligence
"Part of the problem lies in the predictive nature of many AI models. Like the models that power ChatGPT, which are trained to predict text, video generation models predict what is statistically most plausible to look right next. In neither case does the AI hold a clearly defined model of the world that it continuously updates to make more informed decisions."
"A simple way to understand world modeling is through four-dimensional, or 4D, models (three dimensions plus time). To do this, let's think back to 2012, when Titanic, 15 years after its theatrical release, was painstakingly converted into stereoscopic 3D. If you were to freeze any frame, you would have an impression of distance between characters and objects on the ship. But if Leonardo DiCaprio had his back to the camera, you wouldn't be able to walk around him to see his face."
Many current AI systems generate outputs by predicting the most statistically plausible next token or frame, which can cause temporal and spatial inconsistencies such as disappearing objects or altered furniture. Researchers are advancing world models that encode explicit spatiotemporal structure, enabling continuous updating of an environment representation across time. Four-dimensional modeling (three spatial dimensions plus time) exemplifies this approach by preserving object permanence and enabling multiple viewpoints rather than relying on stereoscopic illusions. Improved world models have broad implications for video generation, augmented reality, robotics, autonomous vehicles, and the pursuit of humanlike intelligence or artificial general intelligence.
Read at www.scientificamerican.com
Unable to calculate read time
[
|
]