There is a certain irony in writing a blog about cognitive systems without instrumenting the blog itself. If the cognitive-substrate is supposed to model what is meaningful from a stream of experience, it helps to have actual signals about what readers find meaningful, not just guesses.
This post describes exactly how this blog observes reader behavior, how it keeps that observation anonymous, and what the cognitive pipeline does with the data it receives.
What Gets Tracked
The blog emits structured events whenever something behaviorally meaningful happens. There are two categories: passive signals that fire automatically, and active signals that fire when you do something.
Passive signals
page_view- fires when you land on a page, with path and titlescroll_depth- fires at 25%, 50%, 75%, and 90% scroll milestonestime_on_page- fires when you navigate away, with elapsed secondsarticle_complete- fires when you reach the bottom of an article, with reading timefocus_loss/focus_gain- fires when you switch tabs and return, with the duration you were away
Active signals
nav_click- fires when you click a navigation link (header, logo)tag_click- fires when you click a tag badge, including whether you clicked it from an article, the index, or the tags pageseries_nav_click- fires when you navigate prev/next within a series, with the direction and which articles were involvedoutbound_link- fires when you click a link that leaves the sitecopy_code- fires when you copy a code snippet, with the languagesearch_query- fires when you submit a search, with the query and result count
Each event carries a session ID (a random UUID generated per browser tab visit) and a timestamp. That is all that is required. No IP address, no user agent stored with the event, no cookies set without consent.
How Anonymity Works
The session ID is generated with crypto.randomUUID() and stored in sessionStorage, which clears when the tab closes. It is not linked to any persistent identity unless you explicitly opt in.
If you grant consent (there is a consent prompt at the bottom of the page), the provider also stores a readerId in localStorage - another random UUID - along with a session count and first-seen timestamp. This enables the pipeline to distinguish new readers from returning readers without knowing who either of them is.
There is no server-side join between session IDs and IP addresses. The telemetry route receives only what the client sends. What the client sends contains no information about who you are, only what you did.
The Pipeline
Events are batched client-side in a queue that flushes every ten seconds or on tab close, whichever comes first. The batch is posted to /api/telemetry on the blog's own domain - no third-party trackers, no data leaving to an external analytics service.
The route handler does two things with each batch:
Custom transport (Kafka -> ingest-worker)
Events are forwarded to a Kafka topic (telemetry.logs.raw). The cognitive-substrate's ingest-worker consumes this topic and maps each event to an ExperienceEvent - the shared vocabulary the cognitive pipeline uses to reason about experience.
OTEL metrics (OTEL collector)
Simultaneously, the route handler increments OpenTelemetry counters and histograms for each event type. These flow through the OTEL collector to ClickHouse for time-series aggregation and dashboards.
The two pipelines serve different purposes. OTEL metrics answer aggregate questions: how many page views today, what is the median scroll depth across articles. The cognitive pipeline asks a different question: what does this particular session tell us about which content is landing, and how should that shape what gets written next?
The Importance Scoring
The ingest-worker assigns each event an importance score between 0 and 1 before storing it. The scoring reflects signal strength:
| Event | Score | Reasoning |
|---|---|---|
article_complete | 0.85 - 1.0 | Deliberate full read; scales with reading time |
scroll_depth 90% | 0.75 | Near-complete read |
scroll_depth 75% | 0.55 | Meaningful engagement |
copy_code | 0.65 | Intent to use the material |
outbound_link | 0.55 | Content drove action |
search_query (0 results) | 0.70 | Niche curiosity signal |
search_query (results) | 0.50 | Active information seeking |
series_nav_click | 0.45 | Deep engagement - progressing through a series |
scroll_depth 50% | 0.30 | Moderate engagement |
tag_click (article context) | 0.20 | Topic interest signal |
focus_loss | 0.10 - 0.30 | Scales with dwell time before leaving |
page_view / focus_gain | 0.10 | Baseline presence |
nav_click | 0.05 | Navigation intent, minimal signal |
The scored events accumulate in OpenSearch as ExperienceEvent records. The cognitive pipeline can query this store to answer questions like: which articles have the highest mean importance score across sessions? Which topics generate the most copy_code and outbound_link events?
What I Expect to Learn
The honest answer is I am not sure yet. The instrumentation is new.
The hypothesis I am most interested in is whether series_nav_click progression is a better proxy for whether an article series is working than page views. Page views measure reach. A reader who follows all five parts of a series is telling me something different than a reader who lands on part one and leaves.
A second hypothesis: copy_code and outbound_link events are leading indicators of practical value. An article that generates a lot of copy events is probably teaching something usable. An article with high scroll depth but no copy or outbound events might be interesting to read but not useful to act on. That distinction matters for what to write next.
The search_query data with zero results is particularly interesting. Every zero-result query is a reader telling me something they wanted to find and could not. That is a content gap signal.
The longer-term goal is to close a loop: the cognitive pipeline not only observes what content lands, but eventually surfaces those observations back to the writing process. Not by automating anything, but by making the pattern visible. What topics keep generating engagement months after publication? Which series parts lose readers? Which code snippets do people actually copy?
None of that requires knowing who you are. It only requires knowing what happened.