When building the memory layer for the Cognitive Substrate, I needed something that could reliably handle a constant stream of events from many different sources - telemetry, logs, tickets, Slack messages, and more - without any of those sources needing to coordinate directly with each other.
I chose Kafka for three main reasons.
Why Kafka
High-throughput event streaming - Kafka is built for this. It handles thousands of events per second comfortably without special tuning, and scales well beyond that when needed.
Producer-consumer decoupling - the telemetry system publishes events without needing to know which workers will process them or when. The ingestion worker processes events without needing to know about the telemetry system. They communicate only through the Kafka topic. This decoupling makes the architecture easy to extend: adding a new event source means writing a producer, not changing any existing consumers.
Durability and replay - once data is written to Kafka, it is retained for a configurable period and can be replayed from any offset. This is valuable for the Cognitive Substrate specifically: when I add a new processing worker, I can replay historical events to populate its state rather than starting from scratch.
Kafka as the Central Nervous System
Kafka acts as the central event bus for the entire Cognitive Substrate. Every experience - whether it originated as a telemetry spike, a support ticket, or a Slack thread - first lands in a Kafka topic. Workers consume from those topics at their own pace, process the events, and emit outputs to other topics.
This design gives resilience (workers can crash and restart without losing events), scalability (add workers to increase throughput), and auditability (the full event history is available for replay and debugging).
Here is the CognitiveProducer.publish method. Every message is automatically mirrored to the immutable audit topic - a pattern that would be impossible to enforce consistently with ad-hoc producers:
// packages/kafka-bus/src/producer.ts
export class CognitiveProducer {
async publish<T>(topic: TopicName, payload: T, options: PublishOptions = {}): Promise<void> {
const headers = options.traceContext ? injectTraceContext(options.traceContext) : {}
const messages: Array<{ topic: string; messages: Message[] }> = [
{
topic,
messages: [
{
key: options.key ? Buffer.from(options.key) : null,
value: Buffer.from(JSON.stringify(payload)),
headers,
},
],
},
]
// Every message is automatically mirrored to the immutable audit topic,
// except for audit messages themselves (which would cause infinite recursion).
if (this.enableAuditMirror && topic !== Topics.AUDIT_EVENTS) {
const auditPayload = { originalTopic: topic, payload, timestamp: new Date().toISOString() }
messages.push({
topic: Topics.AUDIT_EVENTS,
messages: [
{
key: options.key ? Buffer.from(options.key) : null,
value: Buffer.from(JSON.stringify(auditPayload)),
headers,
},
],
})
}
await this.producer.sendBatch({ topicMessages: messages, compression: CompressionTypes.GZIP })
}
}The audit mirror is opt-in per producer instance but enabled by default. Any message published to any Cognitive Substrate topic creates a corresponding audit record with the original topic name, payload, and timestamp - providing an immutable event history without any per-call boilerplate.
The Problem I Did Not See Coming
Using Kafka this way works well until you have more than one producer writing events. Then you discover the problem: without a formal contract about what the events look like, one innocent change to a message structure can silently break every downstream consumer.
That is the problem the Schema Registry solves. The next article covers how I hit it and what it cost.