Steve HutchinsonBig Pines
·6 min read·Stage 17·Cognitive Substrate

Cognitive Economics

This article describes the budget engine that governs compute allocation, utility thresholds, fast and slow cognition modes, and exhaustion.

Compute as a scarce resource

Cognitive economics: task context and operation costs feed the budget engine, which assigns quotas and gates operations into fast or slow mode while tracking exhaustion.
Cognitive economics: task context and operation costs feed the budget engine, which assigns quotas and gates operations into fast or slow mode while tracking exhaustion.

Reasoning is not free. Retrieval, reflection, multi-agent debate, world-model simulation, and reranking all consume latency and compute. In a real deployment, these costs accumulate rapidly. A system that applies full reasoning depth to every request will be slow, expensive, and unable to distinguish between a complex problem that deserves it and a routine task that does not.

Without an economic layer, the architecture over-invests in easy decisions and under-invests in hard ones. Cognitive economics introduces explicit budgeting: a mechanism for allocating compute to operations based on their expected value.

Agent quotas and utility thresholds

The budget engine assigns compute quotas to agents and operations. A quota specifies how many tokens, latency budget, or reasoning steps an operation can consume. Quotas are not uniform: a high-stakes arbitration decision can be granted more budget than a routine memory retrieval.

Beyond quotas, the engine applies utility threshold gating: it estimates whether the expected utility of an operation exceeds its cost before the operation runs. Low-utility operations are skipped or assigned a cheaper path. This prevents the system from spending resources on operations that will not meaningfully affect the decision.

Experiment 27 demonstrated this with concrete numbers. A high-utility request (utility = 0.9, cost = 0.1, uncertainty = 0.4) passed the budget gate and was approved for slow mode, which uses deep retrieval, multi-agent critique, and world-model simulation. A low-utility request (utility = 0.2, cost = 0.05) was rejected because it fell below the utility threshold, even though its cost was very low. The rejection reason was utility_below_threshold, not cost overrun.

This distinction matters: the system is not only rationing scarce resources; it is making a judgment that some operations are not worth doing regardless of their cost.

Fast and slow cognition modes

The practical output of the budget gate is a mode decision. Two modes cover most situations:

Slow mode uses deep retrieval (quality lane with reranking), full multi-agent debate (planner, executor, critic, memory agent, world-model agent), reflection checkpoints, and high inference depth. It is reserved for decisions where the expected utility of better reasoning exceeds the cost of producing it.

Fast mode uses shallow retrieval (efficient lane), single-agent reasoning without debate, no reflection, and low inference depth. It is suitable for routine tasks, low-stakes queries, and high-volume workloads where marginal reasoning quality improvements are not worth the additional cost.

The boundary between fast and slow is a policy decision, not a hard rule. The budget engine evaluates three conditions for slow mode approval: utility above the threshold (0.65 in Experiment 27), uncertainty high enough to benefit from deeper reasoning (above 0.35), and cognitive exhaustion low enough that the system has capacity (below 0.70). All three must hold.

This threshold structure encodes an important insight: slow reasoning is not always better. A high-certainty situation (certainty = 0.9) does not benefit from more deliberation; the slow mode overhead costs more than it contributes. A high-exhaustion situation means the system's capacity for sustained reasoning is degraded, and spending more budget on a slow-mode pass may produce worse results than a focused fast-mode pass would.

Token quotas and quota exhaustion

Experiment 27 demonstrated a quota exhaustion scenario. After a 900-token request was approved and processed (against a 1000-token session quota), a 200-token follow-up request was rejected with quota_exceeded. The rejection was not because the aggregate exhaustion measure was high (it was only 0.485, well below the 0.70 threshold); it was because the specific token allowance had been consumed.

The distinction between quota exhaustion and cognitive exhaustion is important. A session quota is a hard ceiling on resource consumption for that session. Cognitive exhaustion is a softer measure that accumulates based on the difficulty and intensity of recent work. The system tracks both independently.

A session that has consumed its token quota but maintains low cognitive exhaustion may still be productive in fast mode (which uses far fewer tokens). A session that has not consumed its quota but has high cognitive exhaustion may produce lower-quality outputs in slow mode. Managing both dimensions is necessary for reliable behavior under varied load patterns.

Exhaustion modeling and graceful degradation

Long-running systems accumulate budget pressure. The exhaustion model tracks this pressure as a scalar that grows with intensive reasoning and decays during idle periods. When exhaustion exceeds the threshold, the engine switches to fast mode regardless of task utility, extends recovery time between slow-mode passes, and defers reflection to the next low-exhaustion period.

This graceful degradation prevents two failure modes. The first is compute exhaustion: a system that never degrades under load will eventually run out of budget and fail hard. The second is quality degradation under exhaustion: a system that continues applying slow-mode reasoning when exhausted will produce lower-quality outputs (because sustained deep reasoning is itself cognitively expensive) while consuming budget that could have been preserved for recovery.

Graceful degradation means accepting lower quality now in exchange for sustained availability later. This is not a failure; it is an adaptive response to resource pressure.

The relationship to temporal density

The cognitive economics system receives density signals from the temporal engine. High temporal density (many competing demands, tight deadlines) triggers slow mode even when exhaustion would otherwise suggest fast mode, because high-density moments are precisely the ones where reasoning quality matters most.

This coupling means the two systems together produce better allocation than either alone. The temporal engine knows when situations are critical; the budget engine knows when resources are available. Together, they allocate deep reasoning to critical moments and shallow reasoning to routine ones, regardless of the raw count of operations in the session.

The interaction also creates a natural priority: the budget engine defers to temporal urgency when they conflict. A high-density, high-urgency moment gets slow mode even at high exhaustion, because failing to reason well in a critical moment is worse than accumulating more exhaustion. The engine records this override and factors it into subsequent exhaustion projections.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.