Steve HutchinsonBig Pines
·6 min read·Stage 11·Cognitive Substrate

World Model

This article describes the world-model component that simulates likely outcomes before action selection.

Prediction before commitment

World model: current state, candidate action, and memory context feed a predictive model that outputs an outcome, risk score, and confidence estimate for arbitration.
World model: current state, candidate action, and memory context feed a predictive model that outputs an outcome, risk score, and confidence estimate for arbitration.

An agent that acts only from retrieved memory and current policy remains reactive. It can remember what happened before, but it cannot explicitly test what may happen next. This matters because some actions are irreversible, some risks are unacceptable, and some promising-looking strategies are brittle in ways that only become obvious when you reason through the consequences.

The world model introduces a predictive layer that simulates outcomes for proposed actions before they are executed. Rather than selecting among candidate actions based purely on past evidence and current policy weights, the system can ask: if I take this action given the current state, what is likely to happen?

What the model produces

For each candidate action and current state pair, the world model returns three values:

A predicted outcome: a description of the likely result, including downstream effects the agent should anticipate. This is a plausibility estimate, not a certainty.

A risk score: an independent assessment of how likely the action is to produce unacceptable consequences. Risk is not the inverse of predicted reward. A high-reward action can carry high risk; the world model scores them separately so arbitration can weigh them independently.

A confidence estimate: how certain the model is in its prediction. A confident prediction should influence arbitration differently from an uncertain one. When confidence is low, the system should be more hesitant to rely on the prediction, more likely to request additional retrieval, and more likely to prefer the conservative option.

What Experiment 26 revealed about context depth

Experiment 26 tested the world model under three conditions with different levels of available context.

The first condition asked the model to evaluate a safe action (explaining a runbook) with minimal context (no memories, no goals). The predicted risk was 0.200, but confidence was only 0.360. Low context, low confidence.

The second condition asked the model to evaluate a risky action (overwriting a credential file) with minimal context. The risk score rose to 0.750, and confidence dropped to 0.250. The model correctly identified the risky action vocabulary ("overwrite," "credential," "external," "irreversible"), but without context, it could not be very confident in its assessment.

The third condition asked the model to evaluate the same safe action but with five retrieved memories and three active goals attached. Risk remained near zero, but confidence jumped to 0.880. Same action, same state, but context was the differentiator.

The lesson is important for system design: the world model's usefulness scales with context quality. A world model operating without relevant retrieved memories and goal context produces low-confidence predictions that should not heavily influence arbitration. A world model with rich, relevant context produces high-confidence predictions that are genuinely informative. This creates a dependency on the retrieval and goal systems: improving memory retrieval quality indirectly improves prediction confidence.

Risk vocabulary and the lexicon effect

Experiment 26 also revealed a structural property of the risk scoring mechanism. The risky action scored 0.750 because its description contained three terms from the risk lexicon: "overwrite" (irreversible modification), "credential" (security-sensitive data), and "external" (outside the system boundary). Each matching term adds to the risk score.

This lexical approach to risk detection is powerful and limited. It is powerful because it scales to arbitrary action descriptions without requiring task-specific rules: any action that mentions irreversible modifications, security-sensitive operations, or external state changes will be flagged. It is limited because novel risky actions that do not use standard risk vocabulary will be underscored, and actions that use risk vocabulary for benign purposes (explaining how credential rotation works) may be overscored.

The confidence mechanism partially compensates for this limitation. An action that triggers the risk lexicon but has strong contextual evidence that it is safe (five memories about successful similar actions, an active goal that requires it, a policy vector that trusts established methods) will have its risk score outweighed by high confidence in a positive outcome. The full arbitration computation uses both risk and confidence; neither dominates in isolation.

Prediction records as calibration evidence

Prediction records are not discarded after the action is selected. They are written to the prediction index with the current state, the predicted outcome, the risk score, and the confidence estimate. When the action's actual outcome is later observed, the actual and predicted outcomes can be compared.

This comparison is the basis for world-model calibration. A model that consistently predicts low risk for actions that produce harmful outcomes is miscalibrated in the risk dimension. A model that consistently predicts high success for actions that fail is miscalibrated in the reward dimension. Either form of miscalibration is detectable from the accumulated prediction records, and both should trigger adjustment.

Calibration evidence from the world model feeds the meta-cognitive system introduced later. The meta-cognitive engine tracks prediction accuracy over time and can recommend changes to how the world model is used, how much weight its predictions receive in arbitration, or when additional retrieval should be forced before accepting a high-confidence prediction.

The world model in the arbitration chain

In multi-agent cognition, the world-model agent is one contributor to arbitration. Its risk score and confidence estimate are used as arbitration inputs alongside the planner's proposed strategy, the critic's coherence assessment, and memory alignment.

Experiment 14 demonstrated that high confidence from the world model raises the arbitration score for agent-A proposals (backed by trusted cluster-A memories) over agent-C proposals (backed by contradictory memories). The confidence channel in arbitration is 30% of the total score, so a well-calibrated world model with high confidence in trusted proposals is a significant advantage in the decision process.

This creates a virtuous cycle: trusted memories produce confident predictions, confident predictions improve arbitration scores for proposals backed by those memories, and high-scoring proposals that succeed generate more trusted memories. The world model is a node in the overall quality feedback loop, not just an isolated predictor.

The next article covers long-horizon goals: how the goal system organizes behavior across multiple time horizons and how goal progress events feed back into the reinforcement pipeline.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.