Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year
Google unveils Gemini 3.5 Flash, a new AI model that can reduce enterprise AI costs by over $1 billion annually, shattering the industry's conventional wisdom that high-performance models must be slow and expensive.

['Google unveiled Gemini 3.5 Flash at its annual I/O developer conference on Tuesday, a new artificial intelligence model that the company says shatters what had become a seemingly iron law of the AI industry: that the smartest models must also be the slowest and most expensive to run. The model sits at the center of a sweeping set of announcements — from a video-generating "world model" called Gemini Omni to a 24/7 personal AI agent called Gemini Spark — but 3.5 Flash carries perhaps the most immediate consequence for the enterprises pouring billions of dollars into AI infrastructure.', 'Sundar Pichai, Google\'s chief executive, told reporters during a press briefing Monday that companies running roughly one trillion tokens per day on Google Cloud could save more than $1 billion annually by shifting 80 percent of their workloads to a mix of Flash and other frontier models. "You\'ve probably heard anecdotes from other CIOs that companies are already blowing through their annual token budgets, and it\'s only May," Pichai said, framing the model not just as a technical achievement but as a financial lifeline for organizations struggling with the runaway costs of deploying AI at scale.', 'The claim, if it holds, would be one of the most significant shifts in the economics of enterprise AI since large language models entered corporate computing.
Why enterprises have been forced to choose between AI quality and AI speed For the past three years, organizations adopting generative AI have faced a painful trade-off. The most capable models — the ones that can reason through complex multistep problems, write reliable code, and parse dense financial documents — tend to be large, slow, and expensive to query. Faster, cheaper models sacrifice accuracy.', "Chief information officers have been forced into a kind of AI portfolio management: routing simple queries to lightweight models and reserving the heavy-duty reasoning engines for high-stakes tasks.
It is a complex, brittle system that adds engineering overhead and often delivers inconsistent user experiences. Gemini 3.5 Flash attacks that trade-off directly. According to Google's internal benchmarks and a third-party analysis from Artificial Analysis , the model outperforms Google's own Gemini 3.1 Pro — a model the company positioned as its top-tier flagship just four to five months ago — on nearly every major benchmark.", 'It scores 76.2 percent on Terminal-Bench 2.1 , reaches 1656 Elo on GDPval-AA , hits 83.6 percent on MCP Atlas , and leads in multimodal understanding with 84.2 percent on CharXiv Reasoning .
Yet it does all of this while generating output tokens at four times the speed of comparable frontier models from competitors. Koray Kavukcuoglu, chief technology officer of Google DeepMind and chief AI architect for Google, told reporters the team has pushed even further: "We have developed an even more optimized version of Flash, not just four times, but actually 12 times faster with the same quality."', 'That turbo variant is available starting Tuesday inside Antigravity , Google\'s agentic development platform. Pichai put the performance gap in blunt terms: "3.5 Flash is better than 3.1 Pro, which was just four months ago, and it\'s at the almost, I would say, 90% of the performance of frontier models, 4x faster, much faster in Antigravity, maybe 12x, and about 1/3 to one half the cost."']
Source: VentureBeat