Must read on Chinese open source from Kevin Xu with the very similarly named substack (story for another time)
interconnect.substack.com/p/chinese-op...
Must read on Chinese open source from Kevin Xu with the very similarly named substack (story for another time)
interconnect.substack.com/p/chinese-op...
We talk about open models as political insurance, the widening frontier gap, and the ever weirder futures of AI. This is a very important time for open models, the weight of it is obvious, but the economic challenges are so extreme.
New conversation on Interconnects w Dean Ball on why the Anthropic v DoW moment could strengthen the long-run case for open models - even if the next few years get rough for open.
www.interconnects.ai/p/how-anthro...
Waiting for deepseek v4
I've written up a blog post that explains why this matters and hybrid models didn't work a few years ago when Mamba was super popular. Plus, this paper is a great entry point for modern deep learning / language modeling scaling theory. Enjoy and send feedback!
www.interconnects.ai/p/olmo-hybri...
In particular, the OSS tools for these new architectures is really limited. New architectures are much slower than standard transformers or popular models like DeepSeek MoEs. This is work that we can do together to keep pushing the frontier of efficient, open models.
It's incredible timing to release a fully open model so people can study how these architecture changes impact the full stack.
Personally, I learned a lot in making the post-training work. Even with the data being identical for pretraining, post-training is very different!
Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear.
Iβm doing my part to save Qwen.
Yes they dm me regularly.
Lots of core team members of Alibaba Qwen are resigning publicly on X.
The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable.
Iβll do my best to keep carrying that torch. Every bit matters.
Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 β Chinese labs' latest push of the frontier
Welcome to the year of the horse! I always learn something new doing these w Florian.
www.interconnects.ai/p/latest-ope...
All of this happening with Anthropic/DoW etc will push a lot more investment in open models, so thereβs transparency in the tools thatβre being used across high stakes domains.
At the same time, these models wonβt be received well if theyβre built in an overly prescriptive way by any government.
It gives me a glimmer of hope in challenging times to see such a deeply respectable, principled stance being held in face of unjust pressure.
Doubly so to see so many I respect and admire standing in support of it.
Stay the course and stand with Anthropic.
If people are working on open research for scaling RL in llms i'd love to talk to you.
How much does distillation really matter for Chinese LLMs?
DeepSeek's usage was a rounding error. MiniMax's was substantial. But distillation is getting less important as RL takes over β it's easier to access "banned" APIs than to smuggle GPUs.
www.interconnects.ai/p/how-much-d...
Made a language model RL cheatsheet for the extra page on the inside back cover of the physical edition RLHF Book.
The RLHF Book should be sent off for printing in the next month or two.
Working on final edits and reviews :D.
Thanks all for your patience.
now that exceeded even my expectations, props Claude
Open models are in a perpetual race to stay relevant at the frontier. While they're doing better than I, and many experts would expect given the cost of models, I don't see evidence that open models are accelerating and surpassing the best closed models.
www.interconnects.ai/p/open-model...
Using the Claude app more due to the personality of the latest Opus and itβs under the radar how much better Claudeβs search has gotten. The top end isnβt as good as GPT Thinking/Pro for research, but the speed is a big upside.
Here are my slides from my recent CMU talk, as I'm transitioning from the Olmo 3 era of just building a reasoning model to thinking about how to do impactful research for agentic systems.
docs.google.com/presentation...
First time at CMU
Fun to set up real analytics and learn that my RLHF Book pdf is downloaded 50-100 times a day from my site (doesnt include Arxiv downloads/views).
Thanks for reading!
Codex app is nice.
Im just a few minutes in and think it'll make some of the crazy things i was doing way easier to monitor.
Poll. Do you see the famous METR plot holding true on Jan. 1st of 2027 (~20 hours), or 2028 (~50 hours).
What would be the right way to measure tasks of that scope?
Beautiful RL scaling plot from Cursor.
cursor.com/blog/compose...
TLDR: codex is a very useful coding tool, claude is the first agent.
I spent a long time testing the new Opus 4.6 and Codex 5.3 models, but the most striking thing was how so many people are reacting to model releases wrong with how we now use models. In my post-benchmark era.
Claude is still king, but codex is closer than ever
www.interconnects.ai/p/opus-46-vs...