Holo3.1: Fast & Local Computer Use Agents
The Holo3.1 family of computer-use models improves robustness across environments, agent frameworks, and deployment targets, with a focus on local inference and seamless integration.

Last March, we released Holo3, our state-of-the-art computer-use model, which saw immediate adoption across a wide range of workflows. However, as users began to deploy Holo3 in production environments, it became clear that performance alone was no longer enough. Users wanted to run the same computer-use capabilities across desktop and mobile environments, with seamless integration with different agent frameworks and deployment flexibility, from cloud inference to fully local execution on end-user devices.
This is why we're releasing the Holo3.1 family, which improves robustness across the three dimensions that matter most in production: environments (web, desktop, mobile), agent frameworks, and deployment targets. For the first time, we're releasing quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4. Holo3.1 is a major step toward our vision of universal computer-use agents: systems that can operate across environments, integrate into any agent stack, and run wherever the workflow lives.
Based on the Qwen family, Holo3.1 was designed to improve robustness across the environments where computer-use agents are actually deployed, while retaining state-of-the-art performance. As teams moved Holo3 from evaluation to production, we repeatedly observed the same challenge: strong performance in one setting does not necessarily transfer to another. Mobile devices, alternative agent harnesses, and different execution frameworks all introduce their own sources of distribution shift.
Holo3.1 expands Holo3's capabilities beyond browser and desktop control, delivering major gains on mobile environments. On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%. To better support teams deploying Holo inside third-party agent stacks, Holo3.1 introduces native support for function-calling protocols in addition to the structured JSON outputs already available in Holo3.
Across OSWorld and our internal benchmark suite covering e-commerce, business software, and collaboration workflows, function-calling and native execution now achieve near-parity performance. Holo3.1 also delivers more than a 25% improvement over Holo3 when evaluated inside our Holotab product harness. To further enable local and on-device inference, we're also releasing new model sizes, including small models (0.8B, 4B, and 9B) for cost-effective and private deployment, in addition to the larger 35B-A3B model for state-of-the-art performance.
The Holo3.1 family is available in four sizes, and we're releasing optimized FP8, NVFP4, and Q4 GGUF checkpoints for local and edge deployment. We look forward to seeing what developers build with Holo3.1.
Source: Hugging Face