Closing the loop
The first four stages build the material conditions for cognition: experience capture, memory retrieval, consolidation, and policy drift tracking. Without a runtime that connects them, each component is useful in isolation but the system as a whole cannot act, evaluate, or adapt. The cognitive agent loop is that connection.
Each cycle of the loop follows a consistent sequence: the agent receives input, hydrates context from memory and policy, reasons over that context, executes an action, captures the outcome, and emits the outcome for policy evaluation. This is the first stage where the architecture behaves as a closed adaptive system rather than a set of storage and scoring components.
Why session state matters
The loop is not stateless. Each cycle belongs to a session: a durable binding of the current input, retrieved memories, active goals, policy state, and identity context under a shared identifier.
This binding is important for an often-overlooked reason. The value of reinforcement depends on being able to reconstruct the causal context of an action. If the system observes that an action produced a bad outcome, it needs to know which memories were retrieved at the time (to identify which evidence led to the decision), which policy version was active (to determine whether the weights produced the bad choice), and which goals were prioritized (to assess whether the outcome was a goal failure or an execution failure).
Without session state, the system can record that something went wrong, but cannot locate where in the decision chain it went wrong. The session is the causal thread that makes evaluation meaningful.
Context hydration: where long-term memory becomes short-term cognition
Before reasoning, the loop assembles the agent's working context. This hydration step retrieves memories from the semantic index using the current input as a query, loads the active policy vector, attaches goal state, and includes identity context where available.
Hydration is the point where stored long-term knowledge becomes active short-term evidence. Retrieved memories are no longer inert records in an index. They become the specific evidence that conditions the next reasoning step. A memory about a database failure pattern retrieved because the current input mentions latency spikes will directly influence what the agent proposes to do.
The quality of hydration determines the quality of reasoning. If retrieval brings in the wrong memories (too stale, too broad, too similar to each other), the reasoning step starts from impoverished context. The session-relative novelty and hybrid retrieval mechanisms described in the memory retrieval article exist precisely to make hydration as relevant as possible.
Reasoning and action
The reasoning call takes the hydrated context and produces a proposed action with a reasoning trace. The trace records why the action was proposed: which evidence supported it, what alternatives were considered, and what the predicted outcome was.
The action stub in early stages is deliberately narrow. The system can call a limited set of tools with constrained effects. This is not a limitation to overcome immediately; it is a deliberate design choice that keeps the loop auditable while the architecture matures. A narrow action space means failures are contained and diagnosable. Expansion happens gradually as confidence in the retrieval, policy, and critique subsystems builds.
The trace is not decorative. Later stages depend on it. Self-reflection reviews traces to identify calibration failures. Meta-cognition uses traces to attribute outcomes to specific reasoning steps. The debate and arbitration system in multi-agent stages uses competing traces to select the best action from multiple candidates. Capturing the trace is an investment in future debuggability.
Outcome capture and the recursive property
Every action produces an outcome record. The outcome is written into the experience stream and evaluated for policy learning. The next cycle can retrieve this experience, and the cycle after that can retrieve the evaluation.
This recursive property is the core of the architecture: cognition produces experience, experience updates memory and policy, and memory and policy condition future cognition. The loop does not just process inputs; it generates the training signal that shapes its own future processing.
The practical consequence is that early behavior strongly influences later behavior through memory. Poor reasoning in early sessions produces poorly-evaluated memories. Those memories, if retrieved in later sessions, can propagate the poor reasoning forward. The importance and decay mechanisms in the index provide some protection (poorly-evaluated memories tend to have lower reward scores and will eventually decay in priority), but they do not eliminate the effect entirely.
This is why the quality of the evaluation step matters as much as the quality of the reasoning step. A reasoning step that produced a genuinely good response but received a poor evaluation will cause the memory of that response to be ranked lower in future retrieval, suppressing a good strategy. An evaluator that consistently over-rewards routine success while under-rewarding creative solutions will gradually shift the policy toward conservative, unimaginative behavior.
The loop as a developmental scaffold
The cognitive loop is described before multi-agent decomposition, identity formation, and the full self-regulation stack, but it is not a simple stage that the architecture graduates beyond. The loop persists as the central runtime throughout the entire architecture. Every later stage adds capabilities that plug into the loop: agents contribute specialized reasoning, attention shapes hydration, goals condition action selection, and constitutional checks gate evaluation.
Understanding the loop structure is therefore essential for understanding every later stage. When an article says "the system retrieved relevant memories" or "the agent evaluated its action," those are descriptions of specific phases in the loop. The loop is the context in which everything else happens.
The next article introduces identity formation: how the system builds a stable self-model from its behavioral history, and why a slowly-updating identity vector separate from the fast-moving policy weights is essential for coherent long-horizon behavior.