Policy engine
The subsystem that maintains adaptive behavioral weights across dimensions including retrieval weighting, exploration preference, risk tolerance, and arbitration bias. Updates are reward-driven and bounded.
The policy engine is the system's fast-adapting behavioral layer - the set of weights that determine how the system behaves right now, as distinct from the slowly-evolving identity state that determines who the system is over time. Policy weights cover every dimension of operational behavior: how much weight to give BM25 vs. k-NN in hybrid retrieval, how much to favor exploration vs. exploitation when selecting among agent proposals, how much risk the current context should tolerate, and how heavily to weight each of the four arbitration dimensions. These weights are updated after every cognitive loop iteration through the policyDelta computed by the reinforcement engine. The policy engine enforces the MAX_ABSOLUTE_DRIFT bound per step and maintains a version history so that the drift trajectory is auditable. Policy state is snapshotted at session start and stored in the cognitive session, making the baseline explicit. If a session ends with large accumulated drift relative to the snapshot, that is a signal worth investigating: either the session contained highly informative experience that legitimately shifted the policy, or something unusual influenced the reward signal.