End of the Ride

Small independent builders are being priced and powered out of AI frontier research. That is the short version. The longer version involves token limits, GPU scarcity, datacenter water rights, and what sovereign compute actually needs to mean.

The personal compute wall

My AI usage at home spans free tiers, paid accounts, and direct API access across several model providers. I have an older CPU-only server for development, but it is not something I benchmark against hosted inference - the gap is too wide. I have looked into building a proper home inference server, and the return on investment lands somewhere between 6 and 18 months depending on hardware, which translates to hundreds of dollars a month in real compute costs before you factor in power.

Even if I solved the hardware problem, some workloads cannot wait for a slow local machine. The cognitive loop at the core of the Cognitive Substrate project needs responses that are both economical and fast. A CPU-only box running quantized models does not meet that bar for anything resembling real-time reasoning.

Meanwhile, the access picture for hosted tools is getting less predictable. Anthropic cut off Windsurf's direct access to Claude models with less than five days notice during the OpenAI acquisition, leaving Windsurf users scrambling.¹ Microsoft restructured GitHub Copilot onto token-based billing and tightened rate limits for some paid plans. The tooling ecosystem is more fragile than it looks when you depend on it for real work.

The infrastructure crunch

The demand for AI compute is growing faster than the infrastructure to support it, and the infrastructure boom itself is running into hard limits.

GPU supply is constrained at multiple points. TSMC's Taiwan fabs produce virtually all of the world's most advanced logic dies - there is no credible alternative at leading-edge nodes until at least 2030,² and a conflict or blockade would be catastrophic for global AI infrastructure. Below the chip itself, the real production bottlenecks in 2025-2026 are not raw materials but two downstream components: CoWoS (Chip on Wafer on Substrate - TSMC's advanced packaging technology that stacks GPU dies and memory together on a silicon interposer, enabling the extreme memory bandwidth modern AI accelerators require) and High Bandwidth Memory. CoWoS capacity is sold out through 2026.³ SK Hynix has sold out its HBM supply to Nvidia through 2026, and Micron's 2026 HBM production is already committed.⁴ Datacenters are being opposed by local residents and governments, and not without reason. The ones being built are not always ecologically sensitive: water is diverted and evaporated for cooling in desert states instead of going to drinking water or aquatic habitat, electricity demand is climbing, and the thermal output of dense compute clusters heats the surrounding environment. The list of externalities is long and mostly unpriced.

xAI's Colossus II - 555,000 GPUs, 2 gigawatts of capacity, $18 billion in hardware - is the largest single AI compute installation in the world.⁵ Most of us do not have access to anything like it, and the gap is widening.

Who gets left out

Independent researchers and small builders have three options right now: find investors, shift focus to optimization work (quantization, efficient inference, smaller models), or accept a slower pace and more constrained scope. None of those are great options if the work requires frontier-scale reasoning.

There is a real possibility that the constraint breeds something useful. Quantization, distillation, and cascade inference are already producing results - techniques developed under compute pressure that may prove more valuable when applied at scale rather than as substitutes for it. Whether that combination produces qualitatively new capabilities is an open research question, not a certainty. But none of it helps anyone working on problems today that need fast, capable models.

Sovereign compute means access, not just geography

Governments, including Canada's, are investing in sovereign compute infrastructure. Canada has committed $2 billion over five years through the Canadian Sovereign AI Compute Strategy, with an active procurement underway for a publicly-owned supercomputer.⁶ That investment is necessary. But sovereignty defined only as geography - data that stays within our borders, compute that runs on our soil - misses the more important question: who can use it?

Public investment in compute infrastructure should come with public access. Educational institutions, independent researchers, and small builders should be considered in the access model, not just the large commercial operators and shareholders listed in the incorporation documents. If the public is underwriting this infrastructure, the benefit needs to flow back to the public - not just as a policy talking point, but as guaranteed, affordable access.

For projects like mine, that access does not exist yet. Until it does, the frontier stays out of reach.

Anthropic co-founder on cutting access to Windsurf: "It would be odd for us to sell Claude to OpenAI." The cutoff came with less than five days notice, mid-acquisition. ↩
The U.S. government is targeting 20% of leading-edge chip production onshore by 2030. TSMC's third Arizona fab targeting leading-edge nodes is not expected until that timeframe. ↩
Nvidia alone has TSMC's advanced packaging lines booked for several years ahead, crowding out other customers. CoWoS connects GPU dies to HBM stacks via a silicon interposer - without it, there is no H100, H200, or GB200. ↩
Micron's sold-out 2026 HBM and SK Hynix's position as Nvidia's primary supplier make HBM the tightest single component in the AI hardware stack. HBM3E delivers up to 1.2 TB/s of memory bandwidth per stack - roughly 10x what GDDR6 can provide. ↩
SemiAnalysis deep dive on Colossus 2 covers the full scale of the installation and what it took to build it. For context: a well-funded AI startup might provision a few hundred GPUs. ↩
The AI Sovereign Compute Infrastructure Program (SCIP) received applications through June 1, 2026. A separate AI Compute Access Fund of $300M is also underway. ↩

The personal compute wall

The infrastructure crunch

Who gets left out

Sovereign compute means access, not just geography

Footnotes

Next up from memory