Policy drift
The time-varying change in policy state driven by reward signals. Per-step drift is bounded by MAX_ABSOLUTE_DRIFT = 0.08 to prevent rapid destabilization. Total accumulated drift over a session is auditable.
Policy drift is how the system adapts to local experience without requiring a full retraining cycle. After each cognitive loop iteration, the reinforcement engine computes a reward signal and translates it into a policyDelta - a signed adjustment to the policy state vector. The delta is bounded by MAX_ABSOLUTE_DRIFT = 0.08 per step, which prevents any single strongly-reinforced or strongly-penalized experience from destabilizing the policy by a large amount. Over a session with many positive outcomes, drift accumulates in a direction that increases the weights associated with the strategies that produced those outcomes. Over a session with poor outcomes, drift reduces the weights on the strategies that preceded them. The total accumulated drift over a session is the integral of these per-step changes and is recorded in the audit stream. Drift that consistently pushes in the same direction across many sessions eventually shifts the policy state substantially; drift that reverses direction frequently indicates the policy is near an equilibrium for the current environment. The exploration factor is itself part of the policy state and can drift: sustained positive outcomes reduce exploration, while sustained failures increase it.