Steve HutchinsonBig Pines
·5 min read·Stage 28·Cognitive Substrate

Recursive Abstraction

This article describes the abstraction engine that forms hierarchical concepts from experiences, patterns, principles, and world models.

From memory to concepts

Abstraction ladder: raw events compress upward through patterns, concepts, principles, and worldview; symbolic handles allow concepts to transfer to new contexts and refine back down.
Abstraction ladder: raw events compress upward through patterns, concepts, principles, and worldview; symbolic handles allow concepts to transfer to new contexts and refine back down.

A memory system stores specific events. A cognitive system must compress events into concepts that transfer across contexts. An agent that can remember every individual incident but cannot form the concept "database connection pool exhaustion" cannot recognize the same pattern when it recurs in a different service with different metric names.

Recursive abstraction is the process by which specific experiences become general patterns, patterns become named concepts, and concepts combine into higher-level principles. Each level of abstraction reduces detail while increasing generality and transfer.

The compression ladder

The abstraction engine organizes representations into five levels:

  1. Experience: the raw events and memories, specific and detailed.
  2. Pattern: recurring features extracted from multiple experiences. A pattern represents what a set of experiences have in common without preserving their individual details.
  3. Concept: a stable, named abstraction from multiple patterns. "Database connection pool exhaustion" is a concept; specific latency spikes are experiences.
  4. Principle: a rule or relationship that holds across multiple concepts. "Resource exhaustion under load is a common cause of cascading failures" is a principle.
  5. Worldview: the highest-level abstractions that organize how the system understands its domain. Worldviews are the most compressed and the most transferable.

Experiment 22 validated the ladder structure with compression ratios exactly as designed: 0.2 / 0.4 / 0.6 / 0.8 / 1.0 at the five levels. The label at each level was the most frequent token in the source corpus for that level.

The symbolic label ceiling

Experiment 22 also documented a structural limitation. When all five levels draw on the same full source corpus for their labels, all five produce the same dominant token. Configuration A (200 events all saying "service = ...") produced "service" as the label for all five levels. Configuration B (4 memories whose generalization mentions "latency") produced "latency" for all five levels.

This is not a bug; it is the natural consequence of using a fixed source corpus at every level without any per-level clustering. The engine acknowledges this explicitly: "Future revisions are expected to build the ladder incrementally by clustering at each level."

This acknowledgment is important for understanding what the symbolic abstraction engine actually provides in its current form. The ladder structure (five levels, correct compression ratios, confidence scaling with source count) is correct and validated. The per-level differentiation of labels requires embeddings, which are addressed in Experiment 23.

Embeddings change everything

Experiment 23 introduced real embeddings (768-dimensional vectors from nomic-embed-text via ollama) and used cosine-centroid clustering to build the ladder incrementally: at each level, the source set is the most representative subset of the level below, determined by proximity to the centroid.

The results were qualitatively different from the symbolic-only approach:

LevelSourcesLabel
experience200experience:service
pattern100pattern:service
concept25concept:degraded
principle4principle:degraded
worldview2worldview:degraded

Source counts halved at each level: 200, 100, 25, 4, 2. This is strictly decreasing, confirming that per-level clustering is active. The label shifted from "service" (dominant across all 200 events) to "degraded" (the centroid of the top-100 most central events, which cluster around the degraded-window events) at the concept level.

This label shift is the key property that the symbolic approach could not produce. The worldview node represents the 2 most semantically central events in the entire corpus, which happen to be from the degraded window. The compression ladder has converged from a broad vocabulary of events to a focused representation of the domain's most representative knowledge.

kNN retrieval with real embeddings

Experiment 23 also validated end-to-end kNN retrieval with the nomic embeddings. Two queries were tested against 200 embedded operational signals:

  • "critical infrastructure failure high latency severe incident outage": top-5 results were 5/5 outage-window events.
  • "steady state background metrics no anomalies quiet": top-5 results were 5/5 normal-window events.

The semantic separation between incident and non-incident windows was strong enough that kNN retrieval correctly classified all top-5 results for both queries. This confirms that the embedding model preserves the semantic distinctions between incident types in vector space, making them recoverable by similarity search.

An important infrastructure finding: OpenSearch 3.0 requires that all documents in a kNN shard have the knn_vector field. A mixed shard containing documents both with and without the field produces ConjunctionDISI errors at query time. The solution is to use a dedicated index for embedded documents, ensuring all documents in the index have the field.

Transfer through invariance

The value of abstraction is transfer: a concept formed from one set of experiences should apply to new experiences that share the underlying structure but differ in surface details.

The operational primitive system (described in a later article in this series) demonstrates transfer through abstraction. Vendor-specific telemetry metrics (Kafka consumer lag, OpenSearch queue rejection rates, database wait times) are mapped to invariant behavioral categories (BACKPRESSURE_ACCUMULATION, QUEUE_GROWTH, RETRY_AMPLIFICATION). The abstraction loses the vendor-specific vocabulary and retains the behavioral structure. The same pattern detector can then match the same behavioral signature across different technology stacks, because it operates on abstractions rather than vendor names.

This is the mature form of the transfer property that the abstraction engine establishes. The embedding-based clustering in Experiment 23 is the technical mechanism; the operational primitive vocabulary is the domain application. Together they show that abstraction is not just a theoretical capability but one with concrete, measurable benefits for cross-system pattern recognition.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.