Steve HutchinsonBig Pines

Exploration factor

A policy engine weight that controls the balance between exploiting known successful strategies and exploring novel approaches. Adjusted by reinforcement based on recent reward history. Higher values increase the probability of selecting less-proven strategies during arbitration.

The exploration factor is one of the adaptive weights maintained by the policy engine. It represents the system's current disposition toward trying new approaches versus relying on strategies that have worked before. When recent reinforcement history is positive and stable, the exploration factor tends to decrease - the system converges on what works. When outcomes are consistently poor or policy learning has converged without solving the problem, the exploration factor increases, directing arbitration toward less-proven proposals. This is the policy-level mechanism that precedes open-ended capability search: if increasing the exploration factor still cannot find a better strategy within the current capability set, that is the signal that capability search should activate. The factor is bounded by MAX_ABSOLUTE_DRIFT constraints to prevent destabilizing swings.

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.