Steve HutchinsonBig Pines
·5 min read·Stage 10·Cognitive Substrate

Internal Debate and Arbitration

This article describes the mechanism that scores competing agent proposals and selects a single action under coherence, reward, memory, and risk considerations.

From proposals to decisions

Arbitration engine: candidate plans, memory evidence, risk prediction, and policy weights combine into a scored winner and a persisted debate trace.
Arbitration engine: candidate plans, memory evidence, risk prediction, and policy weights combine into a scored winner and a persisted debate trace.

Multi-agent decomposition creates multiple candidate outputs. The planner proposes a strategy. The critic flags risks. The world-model agent predicts outcomes. Without a principled way to combine these into a single decision, the system has diversity but no resolution mechanism.

Arbitration is that mechanism. The arbitration engine scores each candidate action on multiple dimensions, combines those scores into a single ranking, selects the highest-scoring candidate, and records the debate trace so the decision can be audited later.

The scoring dimensions

The arbitration score is a weighted combination of four dimensions:

Coherence measures whether the proposal is internally consistent and compatible with the current task and policy. A proposal that contradicts the active goal or violates known constraints scores low on coherence regardless of how promising its reward might seem.

Predicted reward estimates the likely utility of the action given the current state and world-model predictions. This is not the actual outcome; it is the world model's forecast. High predicted reward without high confidence should not dominate the decision.

Memory alignment measures how well the proposal is supported by retrieved experience. A proposal backed by five retrieved memories about similar successful situations scores higher than one backed by none. The score is capped at min(1, count / 5), so five aligned memories is the maximum contribution.

Risk score penalizes actions that are likely to violate constraints or produce unacceptable outcomes. The world-model agent produces this score. High risk reduces the arbitration score even when predicted reward is high.

What Experiment 14 revealed about relative weights

Experiment 14 ran four scenarios that tested whether a high-quality agent (cluster-A memories, high retrieval priority after Hebbian compounding) would consistently win arbitration against a lower-quality agent (cluster-C memories, lower priority) under varying conditions.

The most instructive scenario was "Degraded A": agent-A had only 2 retrieved memories while agent-C had 5 (a mix of cluster-C and cluster-B memories). The intuition might be that more memories should win. The result was that agent-A still won, with an arbitration score of 0.772 against agent-C's 0.714.

The explanation is in the weight structure. Memory alignment is only 25% of the arbitration score. Confidence (30%) and risk score (20%) together account for 50%. Agent-A's cluster-A memories had higher retrieval priority (from Hebbian compounding), which translated to higher agent confidence. High confidence outweighed the memory count disadvantage.

The principle is important: quantity of supporting evidence does not substitute for quality of supporting evidence. An agent with two highly reliable memories will generally outperform an agent with five unreliable memories in arbitration, because the confidence channel reflects memory quality and the memory alignment channel is capped and carries less weight.

The baseline margin paradox

Experiment 14 produced a counterintuitive finding about the relationship between reinforcement and arbitration margin. At baseline (no reinforcement, no Hebbian compounding), the arbitration gap between agent-A and agent-C was 0.325. After 100 turns of reinforcement with countBonus = 0.02, the gap narrowed to 0.311.

This seems wrong: reinforcement should improve the gap. The explanation is that at baseline, agent confidence is derived directly from importanceScore, which for cluster-A memories is 0.80. After reinforcement, retrieval priority converges to a signal-determined fixed point around 0.775 for cluster-A. The baseline importanceScore was actually higher than the post-reinforcement retrieval priority because the EMA formula bounds the maximum toward the signal average.

The lesson is that reinforcement refines the system's picture of memory quality rather than inflating scores beyond what the evidence supports. The arbitration system is already robust at baseline because importance scores encode initial trust. Reinforcement makes the ranking more accurate but does not amplify the gap beyond what the underlying evidence supports.

Persisting debate traces

Every arbitration decision is persisted as a debate trace. Each trace records the candidate proposals, the scores on each dimension, the critic annotations, and the selected winner with its margin.

These records are not just for audit. They are the training signal for future improvement. A pattern of decisions where the planner's top proposal is always overridden by risk score is a signal that the planner may be too aggressive. A pattern where the critic's risk flags are consistently low when the world model predicts failure is a signal that the critic's calibration needs attention.

Debate traces are therefore the primary evidence for diagnosing systematic problems in multi-agent cognition. Without them, you can only observe the final action. With them, you can trace the full deliberation that produced it.

Decision without certainty

Arbitration selects the best available candidate under explicit scoring assumptions. It does not prove that the selected action is correct, and it should not be confused with a guarantee. The selection is a ranked commitment under uncertainty, not a declaration of truth.

This distinction matters for how the system handles its own errors. An incorrect decision that was produced by the correct arbitration process (highest scoring candidate turned out to be wrong) is a different kind of error from an incorrect decision that emerged from a broken arbitration process (the scoring weights systematically favored risky or incoherent candidates). The debate trace makes these two failure modes distinguishable.

The next article introduces the world model: the predictive component that estimates likely outcomes before action selection, and what the experiments revealed about how context depth determines prediction confidence.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.