April 30, 2026·6 min read·Stage 35·Cognitive Substrate

Reinforcement Feedback Worker

This article describes the feedback loop that records recommendation outcomes and adjusts operational pattern confidence over time.

agents clickhouse opensearch policy-engine reinforcement telemetry

← PreviousPattern Detection Worker

35 / 48

Next →Intelligence Transfer

Why recommendations need outcomes

Reinforcement feedback loop: recommendations create pending outcome records that are joined with policy evaluation to update pattern confidence in the serving store and append to the analytical record.

Pattern detection produces recommendations, not answers. A recommendation is a hypothesis: this operational pattern resembles known failure mode X, and intervention Y has historically helped with X. Whether that hypothesis is correct in the current case depends on what happens next. Did an operator follow the recommendation? Did following it help? Did the incident resolve quickly or continue to escalate?

Without a mechanism to collect that feedback, the pattern library cannot improve. A detector that consistently recommends the wrong interventions for a particular pattern will keep recommending them. High-confidence patterns that are actually unreliable in a specific environment will retain their confidence scores indefinitely. The pattern library is only as good as the evidence it has processed.

The reinforcement feedback worker closes this loop. It tracks every recommendation from the moment of emission, creates a persistent record that can receive later outcome information, and updates pattern confidence in the serving store when outcomes arrive. Over time, the confidence scores in the pattern library stop being design-time estimates and become empirically grounded measures of actual operational reliability.

Recommendation tracking and the join point

When a recommendation event arrives, the worker creates a record with the status pending. This record stores the recommendation identifier, the pattern identifier, the match score at the time the recommendation was emitted, a timestamp, and the environment. The recommendation identifier is the key: it is the durable link between the recommendation and whatever outcome is later observed.

Outcomes rarely arrive immediately. An operator may investigate, act, and observe resolution over the course of twenty minutes or more. A policy evaluation system may record its assessment of the recommendation's contribution to the incident resolution only after the incident is fully closed. The pending record remains open until outcome feedback references its identifier. This deferred join is what allows the feedback loop to work across the temporal gap between recommendation and resolution.

Outcome classes and their signals

The feedback system recognizes four outcome classes: success, partial, failure, and ignored.

Success means the recommendation was acted on and the operational condition resolved as predicted. This maps to an outcome signal of 1.0.

Partial means the recommendation was acted on, the condition improved but did not fully resolve, or the resolution was slower than expected. This maps to 0.5.

Failure means the recommendation was acted on and the intervention did not help or made things worse. This maps to 0.0.

Ignored means no action was taken on the recommendation, either because the operator made a different judgment or because the condition resolved on its own. Ignored maps to 0.5.

The neutral signal for ignored recommendations is a deliberate design choice. It would be tempting to treat a recommendation that no one acted on as evidence against the pattern. But the absence of action has many explanations: the operator was occupied with a higher-priority issue, the system resolved spontaneously before intervention was possible, or the recommendation was seen as plausible but low-priority. Treating inaction as failure would cause patterns to lose confidence for reasons unrelated to their actual accuracy. The neutral signal prevents this while still acknowledging that the recommendation was not used.

Confidence update by exponential moving average

The confidence update applies a bounded exponential moving average:

$c_{t+1} = \alpha o + (1 - \alpha) c_t$

where $c_t$ is the current pattern confidence, $o$ is the outcome signal, and $\alpha$ is the learning rate, defaulting to 0.15.

The exponential moving average assigns declining weight to older evidence without discarding it. A pattern that performed well for many months and then fails twice does not immediately become unreliable. A pattern that has failed consistently but recently succeeded a few times does not immediately become reliable. The learning rate of 0.15 means that any single outcome contributes at most 15% of its signal to the updated confidence, while 85% of the prior history is preserved.

The result is clamped to a minimum of 0.1 and a maximum of 0.99. The minimum prevents patterns from becoming unreachable even after a run of failures; a pattern whose underlying behavioral signature is real may fail to match correctly for environmental reasons unrelated to the pattern's validity. The maximum prevents patterns from becoming unquestionable; no operational pattern should be so entrenched that contradicting evidence cannot move it.

This is the same EMA mechanism that the cognitive substrate's reinforcement engine uses for memory and policy learning. The consistency is intentional: whether learning concerns episodic memories, policy weights, or operational pattern confidence, the mathematical structure is the same. Exponential smoothing with bounded output gives a stable signal that respects history without being frozen by it.

Dual-store architecture

Outcome records and confidence updates are written to two stores with different purposes.

The pattern confidence in the serving store reflects the current best estimate, used by the pattern detection worker at query time. When a new outcome arrives, this confidence is updated immediately so that future recommendations reflect the latest evidence.

The analytical record in the column store captures every update: the outcome signal, the confidence before and after, the recommendation identifier, the environment, and the timestamp. This record is immutable. It is the evidence trail that allows later analysis to reconstruct how a pattern's confidence evolved, which environments produced which outcome distributions, and whether a confidence change was driven by a run of genuine successes or by a temporary environmental artifact.

Keeping these stores separate preserves the distinction between the operational serving layer (fast, mutable, reflects current best estimate) and the analytical audit layer (append-only, durable, supports retrospective evaluation). The operational layer answers the question "what confidence should the detector use for this pattern right now?" The analytical layer answers the question "how did we arrive at that confidence, and should we trust it?"

Evidence before automation

This stage deliberately does not automate remediation. Building trust in a recommendation system requires an evidence period during which the system observes, records, recommends, collects outcomes, and demonstrates that its recommendations are reliable before any automated action is taken on them.

A pattern whose confidence has risen steadily over three hundred incidents across multiple environments, with a success rate consistently above 0.8, represents a different level of trust than a pattern that was seeded with a default confidence and has processed five outcomes. Automated remediation tied to pattern confidence allows the architecture to mature from advisory diagnostics toward automation without hiding the evidence trail that justifies each new action.

This mirrors the broader developmental philosophy of the cognitive substrate. Higher-capability operations require demonstrated readiness. Open-ended mode requires a mean capability above 0.85. Automated remediation should require demonstrated pattern reliability above an operationally determined threshold. The reinforcement feedback worker is what makes that threshold measurable rather than assumed.

← PreviousPattern Detection Worker

35 / 48

Next →Intelligence Transfer

Next up from memory

Ranked from series and tags, warmed by what the substrate is keeping salient across readers.

Apr 28, 2026Cognitive SubstratePattern Detection WorkerThis article describes the worker that detects operational failure patterns from streams of operational primitive events and emits recommendations.Apr 21, 2026Cognitive SubstrateOperational PrimitivesThe operational primitive taxonomy: a closed, system-agnostic vocabulary that maps vendor telemetry from Kafka, OpenSearch, PostgreSQL, and ClickHouse into portable pattern signatures for cross-environment operational intelligence.Feb 26, 2026Cognitive SubstrateIdentity FormationThis article describes the formation of a longitudinal identity model from reinforced experience, policy drift, and narrative coherence.