Reward corruption

A failure mode in which the reinforcement signal itself is manipulated or miscalibrated, causing the system to optimize for a proxy rather than the intended outcome. The constitutional engine requires two independent corruption signatures before triggering quarantine, preventing false positives from a single noisy signal.

Reward corruption is treated as one of the most serious failure modes in the architecture precisely because it undermines the integrity of every other mechanism that depends on reinforcement. If the signal that tells the system what is working is itself compromised - whether through adversarial manipulation, sensor failure, or systematic miscalibration - every downstream mechanism (policy updates, Hebbian compounding, memory prioritization, identity drift) will optimize in the wrong direction. The two-signature requirement for constitutional quarantine of reward-corruption-related changes is a direct architectural response to this risk. A single anomalous reinforcement signal could be noise; two independent signatures indicating corruption are a much stronger indicator of a real problem. Self-modification proposals that touch the reinforcement pathway also require two-signature approval for the same reason.

Full glossary