Steve HutchinsonBig Pines
·6 min read·BigPines.net

What This Blog Knows About You (And What It Does With It)

How anonymous behavioral signals from this blog feed the cognitive-substrate pipeline, what gets tracked, and what I expect to learn from it.

There is a certain irony in writing a blog about cognitive systems without instrumenting the blog itself. If the cognitive-substrate is supposed to model what is meaningful from a stream of experience, it helps to have actual signals about what readers find meaningful, not just guesses.

This post describes exactly how this blog observes reader behavior, how it keeps that observation anonymous, and what the cognitive pipeline does with the data it receives.

What Gets Tracked

The blog emits structured events whenever something behaviorally meaningful happens. There are two categories: passive signals that fire automatically, and active signals that fire when you do something.

Passive signals

  • page_view - fires when you land on a page, with path and title
  • scroll_depth - fires at 25%, 50%, 75%, and 90% scroll milestones
  • time_on_page - fires when you navigate away, with elapsed seconds
  • article_complete - fires when you reach the bottom of an article, with reading time
  • focus_loss / focus_gain - fires when you switch tabs and return, with the duration you were away

Active signals

  • nav_click - fires when you click a navigation link (header, logo)
  • tag_click - fires when you click a tag badge, including whether you clicked it from an article, the index, or the tags page
  • series_nav_click - fires when you navigate prev/next within a series, with the direction and which articles were involved
  • outbound_link - fires when you click a link that leaves the site
  • copy_code - fires when you copy a code snippet, with the language
  • search_query - fires when you submit a search, with the query and result count

Each event carries a session ID (a random UUID generated per browser tab visit) and a timestamp. That is all that is required. No IP address, no user agent stored with the event, no cookies set without consent.

How Anonymity Works

The session ID is generated with crypto.randomUUID() and stored in sessionStorage, which clears when the tab closes. It is not linked to any persistent identity unless you explicitly opt in.

If you grant consent (there is a consent prompt at the bottom of the page), the provider also stores a readerId in localStorage - another random UUID - along with a session count and first-seen timestamp. This enables the pipeline to distinguish new readers from returning readers without knowing who either of them is.

There is no server-side join between session IDs and IP addresses. The telemetry route receives only what the client sends. What the client sends contains no information about who you are, only what you did.

The Pipeline

Events are batched client-side in a queue that flushes every ten seconds or on tab close, whichever comes first. The batch is posted to /api/telemetry on the blog's own domain - no third-party trackers, no data leaving to an external analytics service.

The route handler does two things with each batch:

Custom transport (Kafka -> ingest-worker)

Events are forwarded to a Kafka topic (telemetry.logs.raw). The cognitive-substrate's ingest-worker consumes this topic and maps each event to an ExperienceEvent - the shared vocabulary the cognitive pipeline uses to reason about experience.

OTEL metrics (OTEL collector)

Simultaneously, the route handler increments OpenTelemetry counters and histograms for each event type. These flow through the OTEL collector to ClickHouse for time-series aggregation and dashboards.

The two pipelines serve different purposes. OTEL metrics answer aggregate questions: how many page views today, what is the median scroll depth across articles. The cognitive pipeline asks a different question: what does this particular session tell us about which content is landing, and how should that shape what gets written next?

The Importance Scoring

The ingest-worker assigns each event an importance score between 0 and 1 before storing it. The scoring reflects signal strength:

EventScoreReasoning
article_complete0.85 - 1.0Deliberate full read; scales with reading time
scroll_depth 90%0.75Near-complete read
scroll_depth 75%0.55Meaningful engagement
copy_code0.65Intent to use the material
outbound_link0.55Content drove action
search_query (0 results)0.70Niche curiosity signal
search_query (results)0.50Active information seeking
series_nav_click0.45Deep engagement - progressing through a series
scroll_depth 50%0.30Moderate engagement
tag_click (article context)0.20Topic interest signal
focus_loss0.10 - 0.30Scales with dwell time before leaving
page_view / focus_gain0.10Baseline presence
nav_click0.05Navigation intent, minimal signal

The scored events accumulate in OpenSearch as ExperienceEvent records. The cognitive pipeline can query this store to answer questions like: which articles have the highest mean importance score across sessions? Which topics generate the most copy_code and outbound_link events?

What I Expect to Learn

The honest answer is I am not sure yet. The instrumentation is new.

The hypothesis I am most interested in is whether series_nav_click progression is a better proxy for whether an article series is working than page views. Page views measure reach. A reader who follows all five parts of a series is telling me something different than a reader who lands on part one and leaves.

A second hypothesis: copy_code and outbound_link events are leading indicators of practical value. An article that generates a lot of copy events is probably teaching something usable. An article with high scroll depth but no copy or outbound events might be interesting to read but not useful to act on. That distinction matters for what to write next.

The search_query data with zero results is particularly interesting. Every zero-result query is a reader telling me something they wanted to find and could not. That is a content gap signal.

The longer-term goal is to close a loop: the cognitive pipeline not only observes what content lands, but eventually surfaces those observations back to the writing process. Not by automating anything, but by making the pattern visible. What topics keep generating engagement months after publication? Which series parts lose readers? Which code snippets do people actually copy?

None of that requires knowing who you are. It only requires knowing what happened.

Related Articles

This site collects anonymous usage data to understand how people read and navigate the blog. Accepting enables persistent reader preferences across visits.