
"Google is previewing a new Gemini AI model designed to navigate and interact with the web via a browser, letting AI agents do things inside interfaces designed for use by people and not robots. The model, called Gemini 2.5 Computer Use, uses "visual understanding and reasoning capabilities" to analyze a user's request and carry out a task, such as filling out and submitting a form. It can be used for UI testing or navigating interfaces made for people who don't have an API or other direct connection available."
"Google says its computer use model "outperforms leading alternatives on multiple web and mobile benchmarks." Unlike ChatGPT Agent and Anthropic's computer use tool, Google's new AI model only has access to a browser - not an entire computer environment. Google notes that it shows "it is not yet optimized for desktop OS-level control" and currently supports 13 actions, including opening a web browser, typing text, as well as dragging and dropping elements."
Gemini 2.5 Computer Use is a browser-focused AI model with visual understanding and reasoning capabilities that can analyze requests and perform tasks such as filling out and submitting forms. The model supports UI testing and navigation of interfaces without APIs, and has been applied to agentic features like AI Mode and Project Mariner for automated browser tasks. The model only has browser access, is not optimized for desktop OS-level control, and supports 13 actions including opening a browser, typing, and drag-and-drop. The model is available via Google AI Studio and Vertex AI and has a demo on Browserbase.
Read at The Verge
Unable to calculate read time
Collection
[
|
...
]