How to Design an OpenHarness Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination
In this tutorial , we build OpenHarness from scratch to better understand how a practical agent harness works.

In this tutorial , we build OpenHarness from scratch to better understand how a practical agent harness works. We recreate the major building blocks that make an agent system useful, including tool use, typed tool schemas, permissions, lifecycle hooks, memory, skills, context compaction, retry logic, cost tracking, and multi-agent coordination. Instead of treating an agent framework as a black box, we expose the full control flow and watch how the harness receives a user task, lets the model decide the next action, validates and executes tool calls, returns observations, and continues the loop until the task is complete. We also keep the implementation runnable so we can experiment with the architecture without needing API keys or complex infrastructure.
We begin by establishing the foundation for the OpenHarness-style tutorial, including imports, async execution support, helper functions, and core data models. We define messages, tool calls, usage tracking, token counting, cost estimation, permission modes, hooks, and the virtual filesystem that keeps execution safe. We use this snippet to establish the basic architecture on which all subsequent tools, agent loops, and demos depend.
We build the practical tool layer that allows the harness to read files, write files, edit content, list files, search text, run Python code, simulate shell commands, search mock web data, load skills, remember notes, ask the user, and spawn subagents. We define each tool with typed inputs, descriptions, permissions, and executable behavior so the agent can interact with its environment in a structured way. We also add the skill library and persistent memory store, which help the agent load knowledge on demand and preserve useful information across sessions.
We define the model brain layer that decides what the agent does next in the loop. We create a scripted mock brain for deterministic execution, a flaky brain to simulate provider errors, a retrying wrapper with exponential backoff, and a real provider brain for Anthropic- or OpenAI-compatible APIs. We use this snippet to show that the harness remains the same while the intelligence layer can switch between mock execution and real LLM calls.
We assemble the system prompt, estimate transcript size, compact long conversations, print streaming events, and define the main QueryEngine. We make the engine responsible for asking the brain what to do, checking permissions, running hooks, executing tools, collecting results, and looping until the task is finished. We also register the default tools and create the base system instruction that guides the agent toward verified and tool-based work.
Source: MarkTechPost