We're releasing base, SFT, & DPO models plus a detailed report. Try them out and let us know what you find.
💻 Models: huggingface.co/collections/...
📊 Data: huggingface.co/collections/...
📄 Technical report: allenai.org/papers/olmo-...
✏️ Blog: allenai.org/blog/olmohyb...
05.03.2026 16:34
👍 4
🔁 1
💬 0
📌 0
Overall, our results suggest compelling advantages for hybrid models over transformers—both theoretically, in terms of expressive power and scaling efficiency, and practically, in terms of benchmark performance and long-context abilities.
05.03.2026 16:34
👍 5
🔁 0
💬 1
📌 0
Moreover, we give a theoretical argument tying this greater expressive power to the improved scaling of hybrid models we see in practice. 🧑🏫
05.03.2026 16:34
👍 3
🔁 0
💬 1
📌 0
What explains the success of Olmo Hybrid? We prove hybrid models are more expressive than transformers or RNNs alone, being capable of representing a broader class of functions related to code evaluation.
05.03.2026 16:34
👍 4
🔁 0
💬 1
📌 0
Olmo Hybrid’s gains hold across pretraining evals.
After pretraining and mid-training, Olmo Hybrid outperforms Olmo 3 in every primary evaluation domain. It wins big on long-context too—on RULER 64k, performance jumps from 70.9% to 85.0%. 🚀
05.03.2026 16:34
👍 4
🔁 0
💬 1
📌 0
Key finding: hybrid models are substantially more data-efficient than transformers.
We show this through rigorous theory + controlled experiments. On MMLU, Olmo Hybrid matches Olmo 3’s accuracy using 49% fewer tokens—roughly 2× efficiency.
05.03.2026 16:34
👍 3
🔁 0
💬 1
📌 0
Olmo Hybrid uses a 3:1 pattern—three Gated DeltaNet layers followed by one attention layer.
This replaces 75% of attention with linear recurrence while keeping attention frequent enough to recover details the recurrent state compresses away.
05.03.2026 16:34
👍 5
🔁 0
💬 1
📌 0
Introducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵
05.03.2026 16:34
👍 26
🔁 4
💬 1
📌 2
GitHub - allenai/molmo2: Code for the Molmo2 Vision-Language Model
Code for the Molmo2 Vision-Language Model. Contribute to allenai/molmo2 development by creating an account on GitHub.
Everything you need to get started – from custom training to deployment – is in the repository. Try it today.
🔗 Code: buff.ly/mH5l9fr
📝 Learn more about Molmo 2: buff.ly/EUFhLRB
03.03.2026 18:12
👍 2
🔁 1
💬 0
📌 0
We’re also releasing deployment tooling:
◼️ Checkpoint conversion to Hugging Face-compatible format
◼️ Inference examples for transformers + vLLM
◼️ Lightweight vision processing utility for offline inference
◼️ Gradio demo, Docker image, & local setup instructions
03.03.2026 18:12
👍 2
🔁 0
💬 1
📌 0
What's in the release:
🔹 Pretraining & fine-tuning scripts (SFT + long-context SFT)
🔹 Multi-node distributed training
🔹 Data download, preprocessing, & visualization utilities
🔹 Single-task & multi-eval scripts with caching
Built for reproducibility & new experiments.
03.03.2026 18:12
👍 1
🔁 0
💬 1
📌 0
📢 Update: the Molmo 2 codebase is now open source.
We're releasing the code behind Molmo 2—our open model family for video & image understanding, pointing, tracking, & more. Now you can easily train Molmo 2 on your own data. 🧵
03.03.2026 18:12
👍 22
🔁 3
💬 1
📌 0
AstaLabs AutoDiscovery
We believe open-ended, surprise-driven exploration is a transformational new capability for researchers.
Try AutoDiscovery in AstaLabs and let us know what you find. 🧑🔬
🔗 buff.ly/ljL4Nym
02.03.2026 20:47
👍 2
🔁 0
💬 0
📌 0
All accounts now receive 500 Hypothesis Credits. ⬆️
If your balance was below 500, we've topped you up. If you had more, you keep it. And if you used your original allocation, you're reactivated with a full 500.
Each credit lets AutoDiscovery generate & test one hypothesis. 🚀
02.03.2026 20:47
👍 2
🔁 0
💬 1
📌 0
AutoDiscovery autonomously explores your dataset, generates hypotheses, tests them, & iterates—surfacing surprising findings you might not think to look for.
Every result is fully auditable: you can inspect the hypothesis, statistical analysis, & underlying code. 🧪
02.03.2026 20:47
👍 2
🔁 0
💬 1
📌 0
In just a few weeks, researchers used AutoDiscovery to generate 20K+ hypotheses across oncology, climate science, marine ecology, entomology, cybersecurity, music cognition, social sciences, & more.
Now we're extending access for three more months—and refreshing credits. 👇
02.03.2026 20:47
👍 12
🔁 3
💬 1
📌 0
A note on privacy: the dataset draws exclusively from users who opted in to share de-identified interactions. We use hashed identifiers with no user IDs, and remove any queries flagged as containing PII.
27.02.2026 17:56
👍 2
🔁 0
💬 1
📌 0
Many users revisit previous reports hours or days later—treating AI-generated outputs as persistent reference artifacts.
And different fields bring different styles: CS researchers are most likely to ask for problem-solving and ideation; history researchers, the least.
27.02.2026 17:56
👍 1
🔁 0
💬 1
📌 0
We found that users treat tools like Asta as collaborative research partners—not merely search engines.
They paste LaTeX drafts asking for citations, submit structured templates, & use prompt engineering techniques picked up from general-purpose chatbots.
27.02.2026 17:56
👍 1
🔁 0
💬 1
📌 0
The Asta Interaction Dataset (AID) captures six months of real researcher use of two AI-powered tools:
🔎 PaperFinder—powers the "Find papers" feature in Asta
👩🔬 ScholarQA—powers "Generate a report"
To our knowledge, it’s one of the largest open datasets of its kind.
27.02.2026 17:56
👍 1
🔁 0
💬 1
📌 0
We analyzed 250K+ queries & 430K+ clickstream interactions from Asta, our AI-powered research assistant—and today we're releasing the full dataset. How do researchers actually use AI science tools? Here's what we found. 🧵
27.02.2026 17:56
👍 23
🔁 6
💬 1
📌 1
PreScience: Forecasting the future of science end-to-end | Ai2
PreScience is a new benchmark that evaluates whether AI can forecast how science unfolds end-to-end, from team formation through eventual impact.
If we want AI that supports real discovery, we need evaluations grounded in how science actually happens.
📄 Learn more: buff.ly/czp8HQ1
📝 Tech report: buff.ly/PtcZTZH
🤗 Dataset: buff.ly/uVEo9Cu
💻 Code: buff.ly/M5XikzS
25.02.2026 16:59
👍 1
🔁 0
💬 0
📌 0
We simulated a full year of AI research by chaining PreScience's four tasks together month by month.
The result: a synthetic corpus that's less diverse and less novel than what real scientists produced—models given diverse inputs still converge on a narrower range of ideas.
25.02.2026 16:59
👍 1
🔁 0
💬 1
📌 0
Our results show even strong baselines fall short ⚠️: GPT-5 averages just 5.6/10 on LACERScore, simple heuristics outperform complex ML for collaborator prediction, and the highest-impact papers are systematically the hardest to forecast.
25.02.2026 16:59
👍 0
🔁 0
💬 1
📌 0
We also introduce LACERScore, a calibrated LLM-as-judge metric for evaluating generated abstracts against real contributions. 🧪
Standard text-similarity metrics can't tell whether two abstracts describe the same scientific finding—LACERScore can.
25.02.2026 16:59
👍 0
🔁 0
💬 1
📌 0
PreScience is grounded in ~100K real papers 📚. It decomposes a scientific advance into four tasks:
✅ Collaborator prediction (who will team up)?
✅ Prior work selection (which papers will they cite)?
✅ Contribution generation (what will they write)?
✅ Impact prediction
25.02.2026 16:59
👍 0
🔁 0
💬 1
📌 0
Every study starts with choices—who to collaborate with, what to build on, & what to contribute. Then the community decides how much attention it deserves.
PreScience asks: can AI predict what comes next across this workflow, given the scientific record up to a fixed date?
25.02.2026 16:59
👍 0
🔁 0
💬 1
📌 0
Can AI predict what scientists will do next—not just one piece, but the whole research process? PreScience is our new model eval for forecasting how science unfolds end-to-end, from how research teams form to a paper's eventual impact. Built with UChicago, supported by NSF.
25.02.2026 16:59
👍 5
🔁 3
💬 1
📌 1
→ Sign up and take AutoDiscovery for a spin before credits expire: buff.ly/WX4aj5b
23.02.2026 21:12
👍 0
🔁 0
💬 0
📌 0