Ai2 (@ai2) — bluesky.baby

We're releasing base, SFT, & DPO models plus a detailed report. Try them out and let us know what you find.

💻 Models: huggingface.co/collections/...

📊 Data: huggingface.co/collections/...

📄 Technical report: allenai.org/papers/olmo-...

✏️ Blog: allenai.org/blog/olmohyb...

05.03.2026 16:34 👍 4 🔁 1 💬 0 📌 0

Overall, our results suggest compelling advantages for hybrid models over transformers—both theoretically, in terms of expressive power and scaling efficiency, and practically, in terms of benchmark performance and long-context abilities.

05.03.2026 16:34 👍 5 🔁 0 💬 1 📌 0

Moreover, we give a theoretical argument tying this greater expressive power to the improved scaling of hybrid models we see in practice. 🧑‍🏫

05.03.2026 16:34 👍 3 🔁 0 💬 1 📌 0

What explains the success of Olmo Hybrid? We prove hybrid models are more expressive than transformers or RNNs alone, being capable of representing a broader class of functions related to code evaluation.

05.03.2026 16:34 👍 4 🔁 0 💬 1 📌 0

Olmo Hybrid’s gains hold across pretraining evals.

After pretraining and mid-training, Olmo Hybrid outperforms Olmo 3 in every primary evaluation domain. It wins big on long-context too—on RULER 64k, performance jumps from 70.9% to 85.0%. 🚀

05.03.2026 16:34 👍 4 🔁 0 💬 1 📌 0

Key finding: hybrid models are substantially more data-efficient than transformers.

We show this through rigorous theory + controlled experiments. On MMLU, Olmo Hybrid matches Olmo 3’s accuracy using 49% fewer tokens—roughly 2× efficiency.

05.03.2026 16:34 👍 3 🔁 0 💬 1 📌 0

Olmo Hybrid uses a 3:1 pattern—three Gated DeltaNet layers followed by one attention layer.

This replaces 75% of attention with linear recurrence while keeping attention frequent enough to recover details the recurrent state compresses away.

05.03.2026 16:34 👍 5 🔁 0 💬 1 📌 0

Introducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵

05.03.2026 16:34 👍 26 🔁 4 💬 1 📌 2

GitHub - allenai/molmo2: Code for the Molmo2 Vision-Language Model Code for the Molmo2 Vision-Language Model. Contribute to allenai/molmo2 development by creating an account on GitHub.

Everything you need to get started – from custom training to deployment – is in the repository. Try it today.
🔗 Code: buff.ly/mH5l9fr
📝 Learn more about Molmo 2: buff.ly/EUFhLRB

03.03.2026 18:12 👍 2 🔁 1 💬 0 📌 0

We’re also releasing deployment tooling:
◼️ Checkpoint conversion to Hugging Face-compatible format
◼️ Inference examples for transformers + vLLM
◼️ Lightweight vision processing utility for offline inference
◼️ Gradio demo, Docker image, & local setup instructions

03.03.2026 18:12 👍 2 🔁 0 💬 1 📌 0

What's in the release:
🔹 Pretraining & fine-tuning scripts (SFT + long-context SFT)
🔹 Multi-node distributed training
🔹 Data download, preprocessing, & visualization utilities
🔹 Single-task & multi-eval scripts with caching

Built for reproducibility & new experiments.

03.03.2026 18:12 👍 1 🔁 0 💬 1 📌 0

📢 Update: the Molmo 2 codebase is now open source.

We're releasing the code behind Molmo 2—our open model family for video & image understanding, pointing, tracking, & more. Now you can easily train Molmo 2 on your own data. 🧵

03.03.2026 18:12 👍 22 🔁 3 💬 1 📌 0

AstaLabs AutoDiscovery

We believe open-ended, surprise-driven exploration is a transformational new capability for researchers.

Try AutoDiscovery in AstaLabs and let us know what you find. 🧑‍🔬

🔗 buff.ly/ljL4Nym

02.03.2026 20:47 👍 2 🔁 0 💬 0 📌 0

All accounts now receive 500 Hypothesis Credits. ⬆️

If your balance was below 500, we've topped you up. If you had more, you keep it. And if you used your original allocation, you're reactivated with a full 500.

Each credit lets AutoDiscovery generate & test one hypothesis. 🚀

02.03.2026 20:47 👍 2 🔁 0 💬 1 📌 0

AutoDiscovery autonomously explores your dataset, generates hypotheses, tests them, & iterates—surfacing surprising findings you might not think to look for.

Every result is fully auditable: you can inspect the hypothesis, statistical analysis, & underlying code. 🧪

02.03.2026 20:47 👍 2 🔁 0 💬 1 📌 0

In just a few weeks, researchers used AutoDiscovery to generate 20K+ hypotheses across oncology, climate science, marine ecology, entomology, cybersecurity, music cognition, social sciences, & more.

Now we're extending access for three more months—and refreshing credits. 👇

02.03.2026 20:47 👍 12 🔁 3 💬 1 📌 0

How do researchers actually use AI-powered science tools? Lessons from 250,000+ queries | Ai2 The Asta Interaction Dataset (AID) contains real researcher queries revealing how scientists actually use AI-powered research tools, and where their habits diverge from what tool builders expect.

We believe the community needs shared, open data to understand how researchers use AI tools. We hope AID helps move the conversation forward.

✍️ Blog: buff.ly/BkntPbs
📄 Paper: buff.ly/2t1kC7k
📊 Data: buff.ly/KEhPggi

27.02.2026 17:56 👍 2 🔁 1 💬 0 📌 0

A note on privacy: the dataset draws exclusively from users who opted in to share de-identified interactions. We use hashed identifiers with no user IDs, and remove any queries flagged as containing PII.

27.02.2026 17:56 👍 2 🔁 0 💬 1 📌 0

Many users revisit previous reports hours or days later—treating AI-generated outputs as persistent reference artifacts.

And different fields bring different styles: CS researchers are most likely to ask for problem-solving and ideation; history researchers, the least.

27.02.2026 17:56 👍 1 🔁 0 💬 1 📌 0

We found that users treat tools like Asta as collaborative research partners—not merely search engines.

They paste LaTeX drafts asking for citations, submit structured templates, & use prompt engineering techniques picked up from general-purpose chatbots.

27.02.2026 17:56 👍 1 🔁 0 💬 1 📌 0

The Asta Interaction Dataset (AID) captures six months of real researcher use of two AI-powered tools:

🔎 PaperFinder—powers the "Find papers" feature in Asta
👩‍🔬 ScholarQA—powers "Generate a report"

To our knowledge, it’s one of the largest open datasets of its kind.

27.02.2026 17:56 👍 1 🔁 0 💬 1 📌 0

We analyzed 250K+ queries & 430K+ clickstream interactions from Asta, our AI-powered research assistant—and today we're releasing the full dataset. How do researchers actually use AI science tools? Here's what we found. 🧵

27.02.2026 17:56 👍 23 🔁 6 💬 1 📌 1

PreScience: Forecasting the future of science end-to-end | Ai2 PreScience is a new benchmark that evaluates whether AI can forecast how science unfolds end-to-end, from team formation through eventual impact.

If we want AI that supports real discovery, we need evaluations grounded in how science actually happens.
📄 Learn more: buff.ly/czp8HQ1
📝 Tech report: buff.ly/PtcZTZH
🤗 Dataset: buff.ly/uVEo9Cu
💻 Code: buff.ly/M5XikzS

25.02.2026 16:59 👍 1 🔁 0 💬 0 📌 0

We simulated a full year of AI research by chaining PreScience's four tasks together month by month.

The result: a synthetic corpus that's less diverse and less novel than what real scientists produced—models given diverse inputs still converge on a narrower range of ideas.

25.02.2026 16:59 👍 1 🔁 0 💬 1 📌 0

Our results show even strong baselines fall short ⚠️: GPT-5 averages just 5.6/10 on LACERScore, simple heuristics outperform complex ML for collaborator prediction, and the highest-impact papers are systematically the hardest to forecast.

25.02.2026 16:59 👍 0 🔁 0 💬 1 📌 0

We also introduce LACERScore, a calibrated LLM-as-judge metric for evaluating generated abstracts against real contributions. 🧪

Standard text-similarity metrics can't tell whether two abstracts describe the same scientific finding—LACERScore can.

25.02.2026 16:59 👍 0 🔁 0 💬 1 📌 0

PreScience is grounded in ~100K real papers 📚. It decomposes a scientific advance into four tasks:

✅ Collaborator prediction (who will team up)?
✅ Prior work selection (which papers will they cite)?
✅ Contribution generation (what will they write)?
✅ Impact prediction

25.02.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Every study starts with choices—who to collaborate with, what to build on, & what to contribute. Then the community decides how much attention it deserves.

PreScience asks: can AI predict what comes next across this workflow, given the scientific record up to a fixed date?

25.02.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Can AI predict what scientists will do next—not just one piece, but the whole research process? PreScience is our new model eval for forecasting how science unfolds end-to-end, from how research teams form to a paper's eventual impact. Built with UChicago, supported by NSF.

25.02.2026 16:59 👍 5 🔁 3 💬 1 📌 1

→ Sign up and take AutoDiscovery for a spin before credits expire: buff.ly/WX4aj5b

23.02.2026 21:12 👍 0 🔁 0 💬 0 📌 0

Ai2

Latest posts by Ai2 @ai2