Steve HutchinsonBig Pines
·5 min read·Stage 25·Cognitive Substrate

Causal Intelligence

This article describes the causal engine that builds structural causal models, evaluates interventions, and simulates counterfactuals from experience.

Beyond correlation

Causal intelligence: experience history and grounded observations build a structural causal model; proposed interventions are simulated to produce counterfactual outcomes and causal abstractions.
Causal intelligence: experience history and grounded observations build a structural causal model; proposed interventions are simulated to produce counterfactual outcomes and causal abstractions.

Memory retrieval finds similarity: this situation resembles past situations where X worked. Pattern detection finds recurrence: this combination of signals has appeared before. Neither capability answers the question that matters most for action selection: if I intervene here, what will change?

Causal intelligence introduces explicit representation of dependencies, interventions, and counterfactual outcomes. It is the difference between observing that outages are associated with high latency, and being able to say that reducing latency would prevent outages (or not, if the latency is a symptom rather than a cause).

Structural causal models from experience

The causal engine constructs structural models from experience history, world-model predictions, and grounded observations. The model represents variables as nodes and hypothesized causal influence as directed edges with strength values.

Experiment 20 demonstrated model inference from operational signals. The engine received 200 experience events from four incident windows (normal, degraded, outage, recovery) and inferred a model by co-occurrence: how often do pairs of terms appear together in event text?

The resulting model had six variables and sixteen edges. The strongest edges were:

  • latency <-> degraded, strength 1.000 (they co-occur in 100% of degraded events)
  • normal <-> metrics, strength 0.600
  • recovery <-> normal, strength 0.556 (recovery events share vocabulary with normal)
  • outage <-> latency, strength 0.455 (50 outage events co-occur with latency across 110 total latency-mentioning events)
  • outage <-> degraded, strength 0.455

The model correctly found normal -> outage strength = 0 because normal-window text contains neither "outage" nor incident vocabulary. Zero joint mentions produce zero edge weight.

After abstracting at a minimum strength of 0.35, the model reduced from 16 to 14 edges, dropping recovery-metrics (0.333) as too weak to include.

The key distinction: observation versus intervention

The causal engine distinguishes two different questions that look similar but are not.

Observation: "When I see latency above threshold, how often does an outage follow?" This is a conditional probability: P(outage | high latency). It says what tends to co-occur with what.

Intervention: "If I reduce latency, how likely is an outage?" This is a do-probability: P(outage | do(latency = low)). It asks what happens when you actively change a variable, breaking whatever caused the latency and potentially changing all downstream effects.

These two quantities are different whenever the observed variable is itself caused by something else. If high latency is caused by database connection pool exhaustion, reducing the latency measurement (say, by changing the metric threshold) does not change the underlying pool exhaustion. The outage risk remains unchanged. Knowing that latency correlates with outages (observation) while not knowing that the database pool is the actual cause leads to ineffective interventions.

The structural causal model makes this distinction explicit. An edge from outage -> latency (outage causes latency) is different from an edge latency -> outage (latency causes outages). The first implies that fixing latency will not fix the outage; the second implies it will.

Counterfactual simulation

Experiment 20 demonstrated counterfactual reasoning with a specific query: do(outage = 1.0). The engine calculated:

  • Baseline latency (without intervention): 0.5
  • Counterfactual latency (given outage = 1.0): 0.955
  • Causal effect: 0.455

This matches the edge strength from the structural model. The counterfactual simulation is essentially asking: what would latency be if we forced outage to be maximally true? The answer is derived from the causal structure, not from observed frequency.

Counterfactuals support failure analysis ("what would have happened if the incident had been detected thirty minutes earlier?"), policy refinement ("if we add this circuit breaker, what effect does the model predict?"), and better world-model training ("the world model predicted X; the actual outcome was Y; what causal structure would produce this divergence?").

Implementation note: the vocabulary dependency

The causal model depends on the vocabulary in the experience events' text fields. The operational event generator used in the experiments populated input.text with a generic message ("Operational signal from ${service}") that had no window-specific vocabulary. This was inadequate for causal inference.

The experimental solution was to enrich each signal's text with window-specific narrative before passing it to the engine: outage events got descriptions mentioning "outage," "latency," and "critical"; normal events got descriptions mentioning "normal," "steady state," and "metrics."

This dependency is not a limitation specific to this implementation; it is a general requirement for any text-based causal inference system. The quality of the causal model is bounded by the quality of the vocabulary in the source events. Generators must embed semantically meaningful, window-specific text to enable causal discovery. A system that logs "signal from service-x" for all events will produce a causal model with no structure.

Causal abstraction

Causal relationships can exist at multiple levels. Individual events co-occur at the lowest level. Patterns of events co-occur at an intermediate level. Principles about categories of events apply at the highest level.

The causal engine supports abstraction of the model: the operation abstract(minStrength = 0.35) removes edges below the strength threshold, producing a sparser model that captures only the strongest dependencies. At the principle level, the model might express "incident conditions cause latency degradation" without naming specific services or metrics.

This multilevel causal representation connects to the abstraction engine. A causal model at the principle level can transfer across service boundaries (the principle applies to all services) in ways that a model at the event level cannot (the specific co-occurrences are service-specific). Transfer through causal abstraction is one of the mechanisms through which the system generalizes from experience to policy.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.