Stanford's DeLM Cuts Multi-Agent Task Costs by 50% Without Central Orchestrator
Stanford's DeLM framework enables agents to coordinate directly, reducing multi-agent task costs by 50% without a central controller.

One of the assumptions behind today’s AI frameworks is that agents require a 'boss' at the center; this orchestrator runs the show, routes requests, and makes sure the whole system doesn’t descend into chaos. That assumption may be wrong, and the cost of carrying it could be measured in inference dollars and coordination latency. A new Stanford framework called a decentralized language model, or DeLM, is built on the premise that agents can coordinate directly, without routing every update through a central controller.
DeLM's shared knowledge base serves as a 'common communication substrate' so that agents can build upon one another’s verified progress without having to route every interaction through a main agent to 'merge, filter, and rebroadcast,' Yuzhen Mao and Azalia Mirhoseini, co-developers of the framework, explain in a research paper. In a typical centralized multi-agent system, a main agent breaks tasks into subtasks, assigns them out to multiple sub-agents in parallel, waits for responses, merges and summarizes intermediate progress, then launches a next wave of orders based on collected context. While this is a natural way to scale LLM reasoning, the Stanford researchers argue that it scales poorly.
Every useful finding, partial finding, and failure must be reported back to the main agent, which then determines what information to merge and rebroadcast to the agents below it. 'As the number of subtasks grows, this controller becomes a communication and integration bottleneck,' Mao and Mirhoseini write. Further, the main orchestrator may 'dilute, omit, or distort' useful information, leading to lost progress.
This bottleneck also occurs in long-context reasoning scenarios. DeLM, by contrast, is built around parallel agents, a shared context, and a task queue. Shared context is essentially a curated store of 'gists,' or information summaries that other agents might find useful.
These include verified and evidence-based findings alongside partial findings and documented failures; they also point to detailed evidence that agents can pull from based on their specific task. Agents 'write compact, verified updates into a shared context that later agents can read directly,' the researchers write. Useful findings, failures, and constraints accumulate as a 'shared problem state,' rather than passing through a central controller.
The pipeline looks like this: Initialization: Inputs are broken into different work units and added to a queue; Parallel execution: Agents work independently and in tandem, pulling tasks and reading shared context as they progress. Compression and verification: Results are compressed into reusable 'gists' that are checked against supporting evidence. Only gists that are fully verified are shared with the group.
Source: VentureBeat