03 / 05
Chapter Three · The Capability Bubble

What is inside the bubble
is what AI can do.

Everything outside it is what it cannot — yet. The bubble expands daily. We work on its surface, and the surface keeps growing and shifting.

In March 2023, GPT-4 could work autonomously for about five minutes. In December 2025, Claude Opus 4.5 extended that to nearly five hours. The task-horizon doubling time, according to METR, has collapsed from seven months to under three. The next generation of models — trained on Nvidia's Blackwell and Rubin clusters now under construction — should push autonomous task length past the work-week by late 2027.
METR Task Horizon · Autonomous work at 50% reliability (log scale)

The bubble inflating.

2019 Q1
FORECAST ZONE3s1m1.0h10h13 workdays20192021202320252027
GPT-4 · Mar 2023
5 min
Claude 3.7 · Feb 2025
59 min
Claude Opus 4.5 · Dec 2025
4h 49m
Forecast · end 2027
~2 wks

§3.1The next generation, on the record.

“Vera Rubin NVL144 delivers 10 times more performance per watt than Grace Blackwell.”
Jensen Huang · Nvidia GTC 2026, Mar 16 2026

The first frontier model trained end-to-end on Rubin silicon will most plausibly ship in late 2026 or 2027. No lab has publicly committed to a date. SemiAnalysis has independently measured Rubin at roughly 50× tokens per watt versus Hopper H200 — corroborating the direction of Nvidia's claim, if not the precise multiple.

Primary sources: METR, “Measuring AI Ability to Complete Long Tasks” (Kwa et al., March 2025) and Time Horizon 1.1 update (Jan 29 2026). Stanford HAI AI Index 2026. Nvidia GTC 2026 keynote (Mar 16 2026). SemiAnalysis coverage of Colossus 2 (Jan 2026) and GB200 vs Hopper benchmarks (Aug 2025). Epoch AI training-compute trend dashboard. ARC-AGI-2 paper (Chollet et al., 2025, arXiv:2505.11831).