Waymo leverages Genie 3 to create a world model for self-driving cars
Briefly

Waymo leverages Genie 3 to create a world model for self-driving cars
"The Waymo World Model is not just a straight port of Genie 3 with dashcam videos stuffed inside. Waymo and DeepMind used a specialized post-training process to make the new model generate both 2D video and 3D lidar outputs of the same scene. While cameras are great for visualizing fine details, Waymo says lidar is necessary to add critical depth information to what a self-driving car "sees" on the road-maybe someone should tell Tesla about that."
"Using a world model allows Waymo to take video from its vehicles and use prompts to change the route the vehicle takes, which it calls driving action control. These simulations, which come with lidar maps, reportedly offer greater realism and consistency than older reconstructive simulation methods. With the world model, Waymo can see what would happen if the car took a different turn."
"This model can also help improve the self-driving AI even without adding or removing everything. There are plenty of dashcam videos available for training self-driving vehicles, but they lack the multimodal sensor data of Waymo's vehicles. Dropping such a video into the Waymo World Model generates matching sensor data, showing how the driving AI would have seen that situation. While the Waymo World Model can create entirely synthetic scenes, the company seems mostly interested in "mutating" the conditions in real videos."
Waymo's World Model produces synchronized 2D video and 3D lidar outputs for the same scene through a specialized post-training process developed with DeepMind. The model enables driving action control by prompting alternative routes and generating lidar-backed simulations that offer greater realism and consistency than reconstructive methods. The system can synthesize missing sensor modalities for dashcam footage, creating matching lidar maps so driving AI can be evaluated on how it would have perceived situations. The model supports mutating real videos—changing time of day, weather, signage, or object placement—to expand training scenarios and help vehicles adapt to more varied and challenging markets.
Read at Ars Technica
Unable to calculate read time
[
|
]