Steve HutchinsonBig Pines
·8 min read·Stage 3·Cognitive Substrate

Consolidation Worker

Consolidation gives memory an offline replay path, selecting replay candidates, building semantic drafts, and emitting update events in a sleep-cycle-like architecture.

Memory that improves while the system is idle

Consolidation pipeline: replay candidates are summarized into semantic memories, clustered, and annotated for contradictions and decay before emitting an update event.
Consolidation pipeline: replay candidates are summarized into semantic memories, clustered, and annotated for contradictions and decay before emitting an update event.

Ingestion captures experience in real time. Retrieval surfaces it on demand. Consolidation does something different: it runs offline, after the fact, to take raw experience events and transform them into richer semantic memories that future retrieval can work with more effectively.

The analogy to sleep-based memory consolidation in neuroscience is intentional but imprecise. What the brain does during slow-wave and REM sleep is not fully understood. What the consolidation worker does is well-defined: it selects candidate experience events, extracts structure from the most important ones, computes composite scores, and writes the result to a semantic memory index. It then emits a notification so downstream consumers know the memory landscape has changed.

The architectural reason for separating consolidation from ingestion is latency. Ingestion must be fast; events arrive in real time and must be captured before context is lost. Consolidation can afford to be slow. It applies richer signals, processes multiple events together, and re-reads the index to check for cross-window contamination. Doing that in the ingestion hot path would introduce unacceptable delays.

Candidate selection and the contamination problem

The first step of consolidation is selecting which experience events to replay. This seems simple (retrieve the most important ones), but the experiments revealed a critical subtlety.

Experience events arrive across multiple incident windows. An outage window may generate fifty high-importance events about authentication failures. A recovery window may generate another fifty. A normal background window generates forty low-importance events. If the candidate selector simply retrieves the top experiences by importance score without any window constraint, outage events will dominate every consolidated memory, regardless of what the system is trying to consolidate.

Experiment 18 demonstrated this concretely. Two hundred synthetic operational signals across four incident windows (normal, degraded, outage, recovery) were indexed and then consolidated. Without window-scoped selection, source counts in the resulting memories drifted from their expected values. With a requiredTags filter that constrained candidate selection to the target window, source counts matched exactly: 40 normal, 60 degraded, 50 outage, and 50 recovery events, with no cross-window contamination.

The requiredTags filter is not optional; it is the mechanism that keeps consolidated memories causally coherent. A memory about an outage should encode only outage evidence. A memory about normal operations should not be contaminated by the dramatic vocabulary of an outage that happened to co-occur in the index.

Extractive consolidation: what the model actually does

The consolidation model is extractive, not abstractive. This is an important distinction. An abstractive model would generate new language, potentially more fluent and more insightful, but harder to audit and more prone to confabulation. An extractive model works with what the source events contain.

The model operates in three steps:

  1. Select top sources by importance. From the candidate pool, the model takes the three events with the highest importance scores. These represent the most significant experiences in the window.

  2. Concatenate summaries. The summaries of the selected events are joined to form the consolidated memory's summary text. The summary preserves the vocabulary and structure of the original high-importance events, which is exactly what retrieval needs: incident-specific vocabulary that BM25 can match against.

  3. Extract dominant tags. The most common tags across selected sources become the generalization sentence, a short structured description of what the memory is about.

The result is a ConsolidationDraft containing a summary, a generalization, and an embedding (when source events carry embeddings). This draft becomes the semantic memory written to the index.

How severity propagates through the pipeline

One of the most important properties of the consolidation pipeline is severity preservation: the importance ordering of raw events should survive transformation into semantic memories.

Experiment 18 validated this end-to-end. Raw outage events were generated with importance scores around 0.92 (severity "outage" mapped to a score near that value at ingestion time). After consolidation, the outage window produced a semantic memory with importanceScore = 0.920. The degraded window produced 0.680. Normal and recovery windows produced approximately 0.26.

The mechanism is straightforward: the consolidation engine averages the importance scores of the selected candidates. Because the requiredTags filter ensures candidates come only from the target window, and because candidate selection picks the highest-importance events, the averaged score faithfully reflects the severity distribution of that window.

This faithful propagation is not guaranteed by the architecture alone; it depends on the combination of window-scoped selection and extractive averaging. An abstractive model could in principle generate a more alarming summary for a normal window, inflating its apparent importance. The extractive approach avoids this by grounding scores in actual source values.

Stability scoring

Beyond importance, the consolidation engine computes a stability score: an estimate of how reliably a memory should be trusted over time. The formula is:

stabilityScore=avg ⁣(importanceScore+rewardScore2) across selected candidatesstabilityScore = \text{avg}\!\left(\frac{importanceScore + rewardScore}{2}\right) \text{ across selected candidates}

At ingestion, rewardScorerewardScore starts at zero. This means stability scores for freshly indexed events are dominated by importance. Outage memories get (0.92+0)/2=0.46(0.92 + 0) / 2 = 0.46 per candidate, averaging to 0.710.71 across the window. Normal memories get (0.26+0)/2=0.13(0.26 + 0) / 2 = 0.13 per candidate, averaging to about 0.380.38.

The stability score therefore reflects both severity and reward history. As the reinforcement engine accumulates feedback on which memories proved useful, future consolidation passes will produce higher stability scores for memories that have been repeatedly retrieved and validated. This creates a positive feedback loop: important memories get consolidated, consolidated memories can be retrieved and evaluated, evaluations become reward signals, and reward signals raise stability in the next consolidation pass.

The empty-embedding guard

A technical detail with practical consequences: when source experience events carry no embeddings (because the embedding step was skipped, the embedding service was unavailable, or the events were generated before embedding infrastructure was set up), the consolidation draft has an empty embedding array.

OpenSearch's k-NN vector field rejects empty arrays. If the consolidation engine writes embedding: [] to a k-NN-mapped field, the document is rejected entirely; not indexed, not retrievable. Early implementations failed silently here: the semantic memory write appeared to succeed but the document never appeared in the index.

The fix is conditional field exclusion: when the embedding is empty, omit the embedding field from the write payload rather than including it with an empty value. The document then indexes correctly, and future retrieval against that memory uses only BM25 lexical matching rather than vector similarity. This is the correct degraded behavior; better to retrieve the memory by keywords than to lose it entirely.

This pattern generalizes: any field whose absence should be treated differently from its empty value needs a conditional exclusion guard, not a default value.

Downstream notification

After writing each consolidated memory to the semantic index, the engine emits memory.semantic.updated. Downstream consumers (the retrieval engine, the policy engine, the reinforcement system) can subscribe to this topic to know when the memory landscape has changed.

This notification matters for cache invalidation. If a retrieval system caches query results for latency reasons, a consolidation event signals that the cached results may now be stale. A policy system that has cached a memory-derived assessment should recompute.

The event contains enough information for consumers to decide whether to re-query: the memory ID, its importance and stability scores, its tags, and the window it represents. Consumers that care only about high-importance memories can filter on the score before re-querying.

What consolidation cannot do

Consolidation makes memories richer and more retrievable. It does not make them correct. If the ingested experience events contained errors (misclassified events, incorrect labels, malformed payloads), consolidation will faithfully propagate those errors into semantic memories.

It also does not resolve contradictions. When two memories conflict, consolidation does not decide which is true. That concern belongs to the reinforcement and forgetting systems, which accumulate evidence about which memories prove reliable over time and eventually suppress or retire the ones that consistently lead to poor outcomes.

The extractive model's conservatism is both its strength and its limitation. It preserves source fidelity, avoids confabulation, and keeps scores auditable. It does not generate insight beyond what the source events contain. The path toward abstractive capability (memories that express principles rather than replays) runs through the abstraction engine described much later in this series.

The next article covers OpenSearch ML inference nodes: how embedding and reranking models can be co-located with the memory index to eliminate the external round-trip and how environment-derived profiles allow the retrieval layer to switch providers without changing application logic.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.