Building a Cognitive Telemetry Pipeline

A blog is not just a document. It is a surface on which attention plays out in time.

When a reader arrives, scrolls, pauses, copies a snippet, follows a link - each of these acts carries information. Not just about what was read, but how it was read: how long the eye rested on a diagram, which code block was copied and from which article, whether the article was abandoned at the first section or read to completion.

This post describes how to capture that signal as a typed, schema-validated event stream - and route it through a pipeline that preserves it for later analysis.

The Event Schema

The foundation is a discriminated union. Every event shares a common base, and the type field determines its shape:

interface TelemetryEventBase {
  timestamp: string // ISO 8601
  sessionId: string // uuid, persisted in sessionStorage
  articleSlug?: string
  referrer?: string
  semanticTopicTags?: string[]
}

type TelemetryEvent =
  | PageViewEvent
  | ArticleCompleteEvent
  | ScrollDepthEvent
  | SnippetCopyEvent
  | SearchQueryEvent
// ...

This shape is validated at the API boundary using Zod's discriminatedUnion, which provides precise error messages and zero runtime overhead for valid events.

The Pipeline Architecture

Events flow from the browser through a batch queue to a Kafka topic, then fan out to ClickHouse (columnar analytics) and OpenSearch (semantic indexing):

Browser
  → BatchQueue (client)
  → POST /api/telemetry
  → KafkaTransport
  → ClickHouse (time-series queries)
  → OpenSearch (semantic retrieval)

The pipeline implementation wires these together:

File not found: examples/telemetry/pipeline.ts

Batching and Reliability

Events are never sent individually. The BatchQueue accumulates events in memory and flushes on a timer or when it reaches a size threshold. The RetryQueue wraps any transport with exponential backoff:

const transport = new RetryQueue(new KafkaTransport({ brokers, topic, clientId }), {
  maxRetries: 3,
  baseDelayMs: 1000,
})

const queue = new BatchQueue(transport, {
  batchSize: 50,
  flushInterval: 10_000,
})

The keepalive: true flag on the HTTP transport ensures that focus_loss events - fired when the page unloads - are not dropped.

The Kafka Producer

The Kafka producer wraps KafkaJS with a simple batch-send interface:

File not found: examples/kafka/producer.ts

What Comes Next

This pipeline is the instrumentation layer. The next article covers what happens on the receiving end: consuming these events from Kafka, materializing them into ClickHouse, and making them retrievable via OpenSearch's k-NN index.

💡Running locally

Set TELEMETRY_TRANSPORT=console to log events to the terminal without needing Kafka or ClickHouse running. The blog works fully offline with this setting.