What This Blog Knows About You (And What It Does With It)

There is a certain irony in writing a blog about cognitive systems without instrumenting the blog itself. If the cognitive-substrate is supposed to model what is meaningful from a stream of experience, it helps to have actual signals about what readers find meaningful, not just guesses.

Accepting the consent banner saves reading progress, unlocks continue-where-you-left-off, and (when ads are enabled) allows a single end-of-post Google AdSense unit that helps cover VPS hosting. AdSense does not load if you Decline. The homepage and About page show live aggregates from reader memory (no reader IDs). Ad impression and click events stay internal to the substrate; they are not shown on the public memory panels. This post describes exactly how this blog observes reader behavior, how it keeps that observation anonymous, and what the cognitive pipeline does with the data it receives.

What Gets Tracked

The blog emits structured events whenever something behaviorally meaningful happens. There are two categories: passive signals that fire automatically, and active signals that fire when you do something.

Passive signals

page_view - fires when you land on a page, with path and title
scroll_depth - fires at 25%, 50%, 75%, and 90% scroll milestones
time_on_page - fires when you navigate away, with elapsed seconds
article_complete - fires when you reach the bottom of an article, with reading time
focus_loss / focus_gain - fires when you switch tabs and return, with the duration you were away

Active signals

nav_click - fires when you click a navigation link (header, logo)
tag_click - fires when you click a tag badge, including whether you clicked it from an article, the index, or the tags page
series_nav_click - fires when you navigate prev/next within a series, with the direction and which articles were involved
outbound_link - fires when you click a link that leaves the site
copy_code - fires when you copy a code snippet, with the language
search_query - fires when you submit a search, with the query and result count
ad_impression / ad_click - fire for the end-of-post hosting-support AdSense slot (only after Accept, when ads are enabled)

Each event carries a session ID (a random UUID generated per browser tab visit) and a timestamp. That is all that is required. No IP address, no user agent stored with the event, no cookies set without consent.

How Anonymity Works

The session ID is generated with crypto.randomUUID() and stored in sessionStorage, which clears when the tab closes. It is not linked to any persistent identity unless you explicitly opt in.

If you grant consent (there is a consent prompt at the bottom of the page), the provider also stores a readerId in localStorage - another random UUID - along with a session count and first-seen timestamp. This enables the pipeline to distinguish new readers from returning readers without knowing who either of them is.

There is no server-side join between session IDs and IP addresses. The telemetry route receives only what the client sends. What the client sends contains no information about who you are, only what you did.

The Pipeline

Events are batched client-side in a queue that flushes every ten seconds or on tab close, whichever comes first. The batch is posted to /api/telemetry on the blog's own domain. Reader behavioral telemetry does not go to a third-party analytics vendor. The only third-party script that may load after Accept is Google AdSense for the single end-of-post hosting-support unit.

The route handler does two things with each batch:

Custom transport (Kafka -> ingest-worker)

Events are forwarded to a Kafka topic (telemetry.logs.raw). The cognitive-substrate's ingest-worker consumes this topic and maps each event to an ExperienceEvent - the shared vocabulary the cognitive pipeline uses to reason about experience.

OTEL metrics (OTEL collector)

Simultaneously, the route handler increments OpenTelemetry counters and histograms for each event type. These flow through the OTEL collector to ClickHouse for time-series aggregation and dashboards.

The two pipelines serve different purposes. OTEL metrics answer aggregate questions: how many page views today, what is the median scroll depth across articles. The cognitive pipeline asks a different question: what does this particular session tell us about which content is landing, and how should that shape what gets written next?

The Importance Scoring

The ingest-worker assigns each event an importance score between 0 and 1 before storing it. The scoring reflects signal strength:

Event	Score	Reasoning
`article_complete`	0.85 - 1.0	Deliberate full read; scales with reading time
`scroll_depth` 90%	0.75	Near-complete read
`scroll_depth` 75%	0.55	Meaningful engagement
`copy_code`	0.65	Intent to use the material
`outbound_link`	0.55	Content drove action
`search_query` (0 results)	0.70	Niche curiosity signal
`search_query` (results)	0.50	Active information seeking
`series_nav_click`	0.45	Deep engagement - progressing through a series
`scroll_depth` 50%	0.30	Moderate engagement
`tag_click` (article context)	0.20	Topic interest signal
`focus_loss`	0.10 - 0.30	Scales with dwell time before leaving
`page_view` / `focus_gain`	0.10	Baseline presence
`nav_click`	0.05	Navigation intent, minimal signal

The scored events accumulate in OpenSearch as ExperienceEvent records. The cognitive pipeline can query this store to answer questions like: which articles have the highest mean importance score across sessions? Which topics generate the most copy_code and outbound_link events?

What I Expect to Learn

The honest answer is I am not sure yet. The instrumentation is new.

The hypothesis I am most interested in is whether series_nav_click progression is a better proxy for whether an article series is working than page views. Page views measure reach. A reader who follows all five parts of a series is telling me something different than a reader who lands on part one and leaves.

A second hypothesis: copy_code and outbound_link events are leading indicators of practical value. An article that generates a lot of copy events is probably teaching something usable. An article with high scroll depth but no copy or outbound events might be interesting to read but not useful to act on. That distinction matters for what to write next.

The search_query data with zero results is particularly interesting. Every zero-result query is a reader telling me something they wanted to find and could not. That is a content gap signal.

The longer-term goal is to close a loop: the cognitive pipeline not only observes what content lands, but eventually surfaces those observations back to the writing process. Not by automating anything, but by making the pattern visible. What topics keep generating engagement months after publication? Which series parts lose readers? Which code snippets do people actually copy?

None of that requires knowing who you are. It only requires knowing what happened.