Steve HutchinsonBig Pines

Policy drift

The time-varying change in policy state driven by reward signals. Per-step drift is bounded by MAX_ABSOLUTE_DRIFT = 0.08 to prevent rapid destabilization. Total accumulated drift over a session is auditable.

Policy drift is how the system adapts to local experience without requiring a full retraining cycle. After each cognitive loop iteration, the reinforcement engine computes a reward signal and translates it into a policyDelta - a signed adjustment to the policy state vector. The delta is bounded by MAX_ABSOLUTE_DRIFT = 0.08 per step, which prevents any single strongly-reinforced or strongly-penalized experience from destabilizing the policy by a large amount. Over a session with many positive outcomes, drift accumulates in a direction that increases the weights associated with the strategies that produced those outcomes. Over a session with poor outcomes, drift reduces the weights on the strategies that preceded them. The total accumulated drift over a session is the integral of these per-step changes and is recorded in the audit stream. Drift that consistently pushes in the same direction across many sessions eventually shifts the policy state substantially; drift that reverses direction frequently indicates the policy is near an equilibrium for the current environment. The exploration factor is itself part of the policy state and can drift: sustained positive outcomes reduce exploration, while sustained failures increase it.

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.