Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs
Today, Sakana AI launched Sakana Fugu .

Today, Sakana AI launched Sakana Fugu . It is a multi-agent orchestration system that behaves like one model. You send a request to a single endpoint. Fugu decides how to handle it internally. It solves a task directly when that is enough. It also assembles and coordinates a team of expert models when needed. The complexity of a multi-agent system never reaches your code.
Fugu is itself a language model. It is trained to call other LLMs in an agent pool. That pool includes instances of itself, called recursively. Fugu manages model selection, delegation, verification, and synthesis internally.
Instead of hard-coded roles or workflows, Fugu learns how to coordinate. It decides when to delegate and how agents should communicate. It then combines their work into one answer. From the outside, you call a single model. Inside, a coordinated system of experts does the work.
Sakana AI frames this as a hedge against single-vendor dependency. If one provider restricts access, Fugu routes around the disruption. The research team cites recent export controls on Anthropic’s Fable and Mythos models as motivation. Over time, newer models can be folded into the pool.
Fugu ships in two variants, both behind one OpenAI-compatible API:
Fugu builds on two ICLR 2026 papers Trinity and the Conductor on learned orchestration.
TRINITY uses a lightweight evolved coordinator across several turns. It assigns Thinker, Worker, or Verifier roles to delegate work adaptively. Conductor is trained with reinforcement learning. It discovers natural-language coordination strategies and focused prompts for diverse LLM pools.
Together, they show systems can learn to assemble and route agents per task. That replaces hand-designed workflows.
Sakana AI compares Fugu against the foundation models it orchestrates. Baselines use provider-reported scores. SWE Bench Pro uses the mini-swe-agent as scaffolding.
The orchestrator posts the top score on 10 of 11 rows. Fugu Ultra tops the four coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam. It ties regular Fugu on GPQA-D. Regular Fugu leads SciCode, τ³ Banking, and Long Context Reasoning. GPT 5.5 wins MRCRv2, the only baseline win here.
Its Fugu models stand shoulder-to-shoulder with Anthropic’s Fable 5 and Mythos Preview. Those two are not in Fugu’s pool, since they are not publicly accessible.
Source: MarkTechPost