From stored experience to active context
The ingestion pipeline described in the previous article stores experience events in two places: a full-fidelity object archive for replay and audit, and a compact OpenSearch index for retrieval. That index is rich with signals (importance scores, decay factors, retrieval counts, semantic embeddings), but none of those signals are useful until something reads them back into an active cognitive context. Retrieval is that something.
The core question retrieval answers is not "what does the system know?" but "which memories should shape the next decision?" These are different questions. A system may have stored evidence about a hundred past incidents, but the one that matters now is the one most similar to the current context, most important, and not yet stale. Getting that selection right is what separates a memory system from a log.
Why flat importance ranking fails
The most naive retrieval strategy is to sort memories by a static importance score and return the top results. This is fast and simple, and it turns out to be deeply flawed.
Experiment 1 tested exactly this strategy against a fixed corpus of nine synthetic memories organized into three clusters. Cluster A held high-importance deployment reliability memories (scores 0.70 to 0.85). Cluster B held moderate-importance experimental observations (0.55 to 0.65). Cluster C held low-importance, contradictory memories (0.15 to 0.30). Flat importance ranking produced a 70% hit rate and only 33% cluster coverage. Cluster B was never retrieved; its scores simply never beat cluster A, regardless of what was being asked.
This failure has a name in cognitive science: attentional capture by salience. The most prominent memories crowd out everything else, even when the less prominent ones are more relevant to the current task. A retrieval system that cannot surface cluster B cannot learn from its experimental observations or notice that a novel situation resembles something unexpected.
Experiment 1 serves as the baseline throughout this series: a simpler configuration of the same retrieval system, without session-relative novelty, against which subsequent improvements are measured. It is not a comparison against an external system; it is the floor from which the architecture climbs.
Session-relative novelty
The fix requires a dynamic score that changes depending on what has already been retrieved in the current session. A memory retrieved recently has been seen; one not retrieved in a while is, relative to the current session, novel. Novelty should add retrieval priority.
Experiment 3 introduced session-relative novelty:
When a memory has not been seen in a long time, the decay term drives novelty toward . When it was retrieved just now, novelty is near .
The combined score becomes:
where controls how strongly novelty competes with static importance. At with a decay of , hit rate rose to and cluster coverage reached . Cluster B was now reachable because its novelty was high after cluster A dominated early turns.
The parameter has a measurable effect on cognitive style. At with slow decay, the system exhibited what I call an ADHD pattern: oscillating between clusters every turn because novelty swung wildly after each retrieval. At with fast decay, it hyperfocused on cluster A for a long burst before cluster B became competitive. The setting produced the most balanced behavior, with genuine exploration but without thrash.
Warm-start and session priming
A retrieval system that starts each session with no memory of prior sessions will re-explore the same familiar memories every time. This is fine for short tasks but wasteful for agents that operate across many sessions.
Experiment 4 introduced warm-start priming, which loads prior-session retrieval history into the recency tracker before the first retrieval of a new session. The effect is striking. A session primed with heavy cluster A usage opened with cluster B at the top of the retrieval list, because those memories had become maximally novel during the inter-session gap. The system's first act was a "context pop" out of its prior groove, exposing it to evidence it had neglected.
This behavior has a useful analogy in human cognition: returning to a problem after sleep often surfaces connections that were invisible during focused effort, because the most recently visited ideas have decayed in salience relative to older, neglected ones.
The parameter controls the escape velocity from prior context. A small keeps the system near its prior focal cluster even after priming. A large allows the novel-by-absence cluster to dominate early turns.
Hybrid retrieval: lexical and semantic lanes
Novelty addresses the temporal dimension of retrieval, covering what has been seen recently. The other dimension is content relevance: given the current context, which memories are semantically related?
The retrieval engine supports four query lanes:
- Quality: deep hybrid recall at full cost, used for high-stakes decisions.
- Efficient: fast approximate recall, used for routine context hydration.
- Hybrid: combines BM25 lexical matching with k-NN vector similarity.
- Legacy vector: pure semantic recall for backward-compatible indexes.
The hybrid lane is the most general. BM25 catches exact-term relevance: if the current context mentions "auth-service outage," memories that contain those exact words score highly. Vector similarity catches semantic proximity, so memories about "authentication failures" or "service degradation" score highly even without the exact words. Together they surface experiences that are both topically and conceptually related to the current context.
Experiment 17 validated lexical retrieval over operational signals. Three different queries (about postgres outage, service recovery, and normal background metrics) each returned the correct incident window in the top three results with no cross-contamination. The vocabulary embedded in the memory summaries was rich enough that BM25 alone sufficed for incident-specific recall. Vector retrieval becomes essential when the query vocabulary diverges from the stored vocabulary, for example when a new incident resembles a known pattern but uses different service names.
Graph augmentation and its limits
Experience memories can carry associative links to each other. A memory about a database failure might link to one about cascading latency, which links to one about alerting thresholds. These links form a graph that could, in principle, extend retrieval by traversing to related memories not captured by score alone.
Experiment 5 tested one-hop graph expansion from top-k retrieval seeds. The result was negative: the graph did not improve hit rate for cluster B, whose members were already reachable through session novelty dynamics. More concerning, the contradicts links surfaced cluster C memories (the low-quality, contradictory ones) at a 30% increased rate with no corresponding benefit.
The lesson is that graph traversal requires careful edge filtering. A contradicts link is structurally useful for self-reflection, but expanding retrieval through it adds noise to the working context. Experiment 6 confirmed this: a diversity slot that guaranteed the best graph neighbor improved cluster B access but introduced cluster C contamination at 50%.
The practical principle is that graph expansion should be filtered by edge type, not unrestricted. Links that support or relate memories can enrich context; links that record contradiction belong in reflection, not retrieval.
Ranking signals on the index
Retrieval is not just about which memories match a query. It is also about which matching memories deserve priority. The index stores per-document ranking signals alongside each memory:
Importance score is assigned at ingestion and refined by consolidation. Higher-importance memories receive more weight when relevance is otherwise equal. The operational experiments showed this propagates faithfully: outage-window memories (importance near 0.92) consistently outranked normal-window memories (importance near 0.26) for incident-specific queries.
Decay factor is a multiplier that reduces over time without retrieval. Memories that are never accessed drift toward lower effective priority, regardless of their original importance. This prevents a once-important but now-stale memory from permanently dominating retrieval.
Retrieval count tracks how many times a memory has been accessed. Systems can use this to implement usage-based boosting (frequently-accessed memories are likely relevant) or usage-based dampening (frequently-accessed memories are already well-known; shift attention to neglected ones). The session-relative novelty formula uses the latter approach.
These signals are stored at index time and updated by consolidation and reinforcement. Each retrieval implicitly shapes future retrievals by updating retrieval count, which in turn affects novelty calculations in the next session.
Feedback records and the retrieval loop
Retrieval is not a terminal operation; it initiates a feedback cycle. When memories are retrieved and later evaluated (did they help? did the action succeed?), that evaluation can propagate back to the memory index as reinforcement signals that adjust importance scores, retrieval priority, and decay rates.
This feedback path is what distinguishes a cognitive memory system from a static knowledge base. A static base retrieves the same memories for the same queries forever. A cognitive memory system retrieves the memories that have proved most useful for similar past contexts and gradually de-prioritizes the ones that repeatedly failed to contribute.
The retrieval feedback record captures which memories were returned, which were used in reasoning, and what outcome followed. Consolidation can then adjust the underlying memory records. The loop is: retrieve, then reason, then act, then evaluate, then consolidate, then retrieve differently.
What retrieval cannot do alone
The retrieval layer is where storage becomes cognition, but it is not sufficient by itself. Retrieval returns the most relevant memories given the current context and prior retrieval history; it does not improve memory quality, form abstractions, or integrate conflicting evidence.
Those concerns belong to consolidation, which runs offline to transform the best candidate experiences into refined semantic memories that future retrieval can surface more effectively. Retrieval and consolidation are complementary: retrieval serves active cognition, consolidation improves the material retrieval has to work with.
The next article describes how the consolidation system works and what the experiments revealed about how severity, importance, and memory structure propagate through the pipeline.