Steve HutchinsonBig Pines

Calibration error

The difference between an agent's stated confidence and its actual accuracy on that operation. Computed as the absolute deviation: |confidence - accuracy|. High mean calibration error across operations indicates the system is systematically overconfident or underconfident, which misleads the arbitration scoring that uses confidence as a 30% weight.

Calibration error is tracked by the reflection loop and the meta-cognition engine as a first-class signal of reasoning health. An agent that reports 0.9 confidence and succeeds has near-zero calibration error. An agent that reports 0.9 confidence and fails has calibration error of 0.9. The mean calibration error across operation types (retrieval, planning, world-model prediction, tool execution) is computed per session and compared against a watchdog threshold (typically 0.35). Exceeding that threshold triggers a watchdog alert and a self-modification proposal to adjust the agent's confidence reporting. Persistent high calibration error is an architectural finding: it indicates that arbitration is being misled by inflated or deflated confidence signals, which biases proposal selection in ways that compound over time.

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.