Policy alignment

A reinforcement channel measuring how consistent an experience event is with the current policy state. High alignment contributes positively to the reinforcement score; low alignment reduces it. Weighted at 16% in the reinforcement scoring function.

Policy alignment ensures that reinforcement signals are sensitive to the system's current behavioral disposition, not just to objective outcome quality. An experience event that produces a positive outcome but violates the current policy's risk tolerance or exploration preference scores lower on policy alignment than one that produces a similar outcome while following the policy. This prevents reinforcement from inadvertently rewarding policy-violating behavior just because it happened to work out. Policy alignment is weighted at 16% of the final reinforcement score - the same as novelty - making it a significant but not dominant factor. Over time, if policy-violating approaches consistently produce better outcomes, that signal accumulates and may eventually drive policy drift toward the previously-violating behavior, updating the policy to reflect learned reality.

Full glossary