please read jimfund.com
please read jimfund.com
I wrote a story about the future jimfund.com#fiction
My predictions for METR's developer uplift survey came in right on target, but I noticed that the question resolves not to this, but to the last such study whose results are released in 2026... oops.
Today's jimfund blog post is a rebuttal to the Citrini piece.
jimfund.com#2026-iv
Some thoughts on AI and math, inspired by “First Proof”: www.daniellitt.com/blog/2026/2/...
Later this year compute is going to be very scarce. The size of the post-ASI economy is so large relative to the present-day economy that labs will direct almost all their compute toward takeoff in an attempt to secure a slice of that post-ASI economy.
If you're surprised by today's Opus 4.6 results, you haven't been paying attention.
Manifold users no longer expect AI progress to plateau in 2026.
(Ironic): I won't read writing that hasn't been looked over by an LLM. If the writer hasn't taken the time to run his work by an LLM, then why should I take the time to read it?
New post up:
jimfund.com#2026
That when model 50% time-horizons are at a certain point their 2% time-horizon is much further out has implications about autonomous research.
AI forecasters should stop making deliberately conservative assumptions.
We will likely have 50% time-horizons of over 50 hours by October this year.
We'll have a time-horizon of about 26 hours. We might even have labs using a serious amount of their compute to do autonomous research by that point. This might not actually show up as improved models until September-ish though (because the doing and the implementing of the research is not instant)
Assuming the doubling-time is currently around 3 months, things should be pretty crazy by June–July.
The impact of each time-horizon doubling exceeds that of the last. Going from 3 -> 6 hours was big. From 6 -> 12 will be massive. This pattern will hold. And doubling-times will decrease.
Over the subsequent months (particularly August–November) we could see Google steadily separate itself from the other labs in terms of capabilities.
But Google isn't very far behind. Its compute advantage could allow it to break away from the pack with Gemini 3.5 (July-ish), which could be a crucial moment (METR-50 time-horizons of ~50 hours, continual learning, novel insight, autonomous ML researchers).
Google seems best-positioned but is still playing catch-up. OpenAI is still in the lead. Anthropic is close behind. xAI also playing catch-up.
Google has the best trajectory, the most compute, TPUs, and the most money. OpenAI currently has the most powerful models. Anthropic has very good coding models and TPUs. xAI has Colossus II and Elon Musk.
It's been a while since I thought seriously about which AI lab is in the lead and which has the best near-term outlook.
1.4x is a lower-bound to expectations. AI will be more helpful by end-of-year than at the time the study is conducted.
I'm working on getting my head around the model of takeoff used here. I'm starting by reading this post:
www.forethought.org/research/wil...
nice
I'm agnostic on whether models can get more performance from latent space reasoning. If so, I expect we'll see this transition instead of one to thinking in tokens of a nonhuman language, but it will still take place at the point where models get to a near-human level of understanding of the world.
I think many of the reasons that people give for discounting OpenAI in the race to AGI fall into this category. OpenAI's models have defined the curve of AI model capabilities across time so far. I expect this to continue at least in the short term.
This kind of truth is important to keep in mind when trying to divine from a lab's statements or actions how it is expecting its future models to perform.
"The other labs that are dependent on fundraising have downplayed such talk exactly because it is counterproductive for raising funds..."
From Zvi's 'AI #153: Living Documents'
www.lesswrong.com/posts/bSQagZ...
"[someone suggested] that Dario talks about extremely short timelines and existential risk in order to raise funds. It’s very much the opposite."
METR's graph has been updated with revised values calculated using an expanded task set.
metr.org/blog/2026-1-...