AI Models

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

AI News Desk

MarkTechPost

Jun 07, 2026

2 min read

A team of researchers introduces Harness-1, a 20B retrieval subagent that uses reinforcement learning inside a stateful search harness to improve search decisions and evidence gathering.

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

['Most search agents are trained as policies over a growing transcript, which can be overwhelming. The model must decide how to search, remember what it saw, and determine which evidence matters. A team of researchers from University of Illinois Urbana-Champaign, UC Berkeley, and Chroma argues that this approach asks too much.

Reinforcement learning ends up optimizing both search decisions and routine bookkeeping at once.', "The researchers' answer is Harness-1, a 20B retrieval subagent built on gpt-oss-20b. It was trained with reinforcement learning inside a stateful search harness, where the harness holds the bookkeeping and the policy keeps the semantic decisions. The weights and harness code are publicly released.

Harness-1 produces a ranked set of documents for a downstream answering model, but it does not answer questions itself.", "The subagent runs inside a state-machine harness centered on a per-episode WORKINGMEMORY. Each turn works as a loop: the harness renders compact search state along with recent actions, the model emits one structured action, and the harness executes it, updates state, and renders the next observation. The research team calls its principle 'stateful cognitive offloading,' where the policy decides what to search, curate, and verify, and when to stop, while the harness maintains the recoverable state around those decisions.", 'The state includes several pieces, such as a candidate pool holding compressed, deduplicated documents, an importance-tagged curated set capped at 30 documents, and a full-text store keeping every retrieved chunk outside the prompt.

An evidence graph adds structure, with a regex extractor scanning each chunk for proper nouns, years, and dates. The harness then renders frequent entities, bridge documents, and singletons. The policy works through eight tools, including fan_out_search, search_corpus, and verify.', "Harness-1 was evaluated on eight benchmarks spanning web, finance, patents, and multi-hop QA, achieving an average curated recall of 0.730, beating the next open subagent, Tongyi DeepResearch 30B, by 11.4 points.

The subagent's transfer pattern shows a clear signal of the mechanism, with a 2.2x larger gain on tasks furthest from training data. Ablations support the harness claim, and the method targets evidence-seeking retrieval where documents support an answer, with several potential workflows, including literature and patent review, financial-filing analysis, and multi-hop fact-checking."]

Share this article

X LinkedIn Telegram

Source: MarkTechPost