12 C
Denver
Wednesday, October 8, 2025

‘Gemini 2.5 Computer Use’ has strong web, Android performance 

Must read

Google is now letting developers preview the Gemini 2.5 Computer Use model behind Project Mariner and agentic features in AI Mode

This “specialized model” can interact with graphical user interfaces, specifically browsers and websites. There are several steps that happen in a loop “until the task is complete.”

  • Send a request to the model: Inputs include the “user request, screenshot of the environment, and a history of recent actions.”
  • “The model then analyzes these inputs and generates a response, typically a function call representing one of the UI actions such as clicking or typing.”
  • Receive the model response: “…client-side code then executes the received action.”
  • “After the action is executed, a new screenshot of the GUI and the current URL are sent back to the Computer Use model as a function response restarting the loop.”

Other UI actions supported by the model include going back/forward, searching the web, navigating to a specific URL, cursor hovering, keyboard combinations, scrolling, and drag/drop. 

Google shared two examples (at 3X speed) with the following prompts:

Advertisement – scroll for more content

“From https://tinyurl.com/pet-care-signup, get all details for any pet with a California residency and add them as a guest in my spa CRM at https://pet-luxe-spa.web.app/. Then, set up a follow up visit appointment with the specialist Anima Lavar for October 10th anytime after 8am. The reason for the visit is the same as their requested treatment.”

My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.”

Gemini 2.5 Computer Use is “primarily optimized for web browsers.” However, Google has an “AndroidWorld” benchmark that “demonstrates strong promise for mobile UI control tasks,” while it’s “not yet optimized for desktop OS-level control.”

Google demonstrated strong performance across web and mobile control benchmarks when compared to Claude and OpenAI’s offering, as well as “leading quality for browser control at the lowest latency.” 

This model is built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities. Google says “versions of this model” power Project Mariner and AI Mode’s agentic capabilities. It’s been used internally for UI testing to speed up software development, while Google has an early access program for third-party developers building assistants and workflow automation tools. 

Gemini 2.5 Computer Use is available in public preview today through Gemini API in Google AI Studio and Vertex AI.

Try it now: In a demo environment hosted by Browserbase.

FTC: We use income earning auto affiliate links. More.

First Appeared on
Source link

- Advertisement -spot_img

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -spot_img

Latest article