Steve HutchinsonBig Pines
·6 min read·Stage 9·Cognitive Substrate

Multi-Agent Decomposition

This article describes the decomposition of cognition into planner, executor, critic, memory, and world-model agents.

Why one agent is not enough

Multi-agent decomposition: five specialized agents each contribute a trace to the shared activity index from a common context.
Multi-agent decomposition: five specialized agents each contribute a trace to the shared activity index from a common context.

A single reasoning call forces planning, execution, critique, memory selection, and outcome prediction into one undifferentiated process. That design is simple, but it makes failures hard to locate and improvements hard to target. When a single-agent system makes a bad decision, the cause could be bad planning, weak evidence, missing critique, inaccurate prediction, or simple execution error. Without separation, all of those causes look identical from the outside.

Multi-agent decomposition solves this by assigning bounded responsibilities. Each agent has a specific role, emits traces for that role's work, and contributes outputs that can be inspected and scored independently.

The five roles

The planner proposes strategy. Given the current context (retrieved memories, goals, policy state, identity context), the planner generates a proposed course of action. It is not responsible for evaluating whether the plan is good; that is the critic's job. The planner's responsibility is to generate a coherent, concrete proposal given the available evidence.

The executor turns strategy into concrete action. The planner may say "query the database and compare results to the expected schema." The executor determines how to do that: which query, which database, in what order, with what error handling. The executor is not designing strategy; it is implementing one.

The critic evaluates coherence and risk. Given the planner's proposal and the executor's plan, the critic asks whether the proposal is internally consistent, whether it is compatible with the current task, and whether it carries unacceptable risk. The critic's output is not a decision; it is an assessment that feeds into arbitration.

The memory agent retrieves relevant context. Rather than leaving retrieval to a single shared query, the memory agent can apply role-specific retrieval strategies: retrieving episodic memories about past similar situations for the planner, retrieving procedural memories about tool use for the executor, and retrieving contradiction records for the critic.

The world-model agent predicts likely outcomes. Given the current state and a candidate action, the world-model agent estimates what will happen. Its output includes a predicted outcome, a risk score, and a confidence estimate. High predicted risk can cause arbitration to reject or modify a plan even if the planner and executor are aligned.

What agents know and what they don't

Every agent receives the same AgentContext: the triggering event, retrieved memories, active goals, and the current policy vector. That shared bundle is how the planner knows what goals are active and how the critic knows the current risk tolerance. What the context does not currently include is a capability manifest -- a declaration of which tools the executor can actually invoke.

In the current implementation, agents produce string proposals. The planner writes a description of intended strategy; the executor writes a description of intended action. Neither has access to an enumeration of available tools. The actual tool routing happens downstream, after arbitration selects a winning proposal. The ReasoningModel translates the winning proposal into a structured ActionRequest, and the ToolExecutor port carries it out.

This means the planner cannot explicitly constrain its proposals to actions the executor can carry out, and the critic cannot reject a plan because it requires a tool that doesn't exist. Those checks currently live at the execution layer, not the deliberation layer.

That gap matters for the design's long-term trajectory. A critic that can evaluate capability feasibility -- not just coherence and risk -- would catch a class of planning errors that currently surface only at execution time, after arbitration has already committed to a path. Exposing a capability manifest in AgentContext is the natural extension that would enable this.

Parallel dispatch and trace density

Agents can run concurrently when their inputs are independent. The planner does not need to wait for the critic. The memory agent does not need to wait for the world-model agent. Running them in parallel reduces wall-clock latency for the overall reasoning step.

Parallelism also produces richer traces. A sequential system's trace records what happened in order. A parallel system's trace records what each role concluded simultaneously, which creates a much richer picture of the deliberation. You can see that the critic flagged a risk the planner ignored, that the world model predicted a worse outcome than the planner anticipated, or that the memory agent retrieved evidence that contradicted the plan.

This trace density is not incidental. It is the precondition for all later reflective and meta-cognitive capabilities. You cannot improve reasoning strategy if you cannot see which reasoning step produced the error.

Activity traces as a diagnostic surface

Every agent emits its work to an activity trace index. Each trace records the agent's role, the inputs it received, the outputs it produced, its confidence estimate, and timing.

The trace index functions as a diagnostic surface for cognition itself. Operators can query it to understand systemic patterns: does the critic consistently flag risks that the planner then overrides? Does the world-model agent produce low-confidence predictions for a specific class of task? Does the memory agent tend to retrieve memories from the same narrow cluster?

Each of these patterns is an actionable finding. The first might indicate that arbitration weights reward too heavily relative to risk. The second might indicate that the world model's training data lacks coverage of a specific task class. The third might indicate that session novelty weighting needs adjustment.

Without decomposition, these patterns are invisible. They exist in the aggregate behavior of a single agent, but they cannot be attributed to specific roles because there are no specific roles to attribute them to.

The design is about accountability, not parallelism

Multi-agent decomposition is sometimes presented as primarily a performance optimization: run agents in parallel to save latency. Parallelism is real and valuable, but it is not the main point.

The main point is accountability. When cognition is decomposed into roles with bounded responsibilities and explicit traces, every failure can be attributed to a specific part of the reasoning process. The planner can be improved without changing the critic. The memory agent can be tuned without touching the executor. Arbitration weights can be adjusted without modifying any of the agents that generate candidates.

This accountability structure is also what makes the system progressively improvable. You can measure the planner's proposal quality over time and improve it if it degrades. You can compare the critic's risk assessments against later observed outcomes and calibrate its weights. The architecture makes each component independently testable, measurable, and improvable.

The next article describes arbitration: the scoring mechanism that selects among competing agent proposals, and what the experiments revealed about how memory quality, confidence, and alignment interact to produce the final decision.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.