Active Site (@activesite.bio)

Thank you to the Frontier Model Forum, Sentinel Bio, and @packardfdn.bsky.social for supporting our work and to our advisory board.

19.02.2026 17:38 👍 1 🔁 0 💬 0 📌 0

Shout out to Shen Zhou Hong, @alex-kleinman.bsky.social, Alyssa Mathiowetz, @adamhowes.bsky.social, @xrg.bsky.social, @lucarighetti.bsky.social, Joe Torres, Julian Cohen, Suveer Ganta, Deepika Pahari, Alex Letizia

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 0

Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to...

You can read more here:

📝 Blog post: activesite.substack.com/p/rct
📄 arXiv Preprint: arxiv.org/abs/2602.16703
🔮 Predictions from @research-fri.bsky.social: forecastingresearch.substack.com/p/how-well-...

19.02.2026 17:38 👍 3 🔁 0 💬 1 📌 0

Active Site Jobs Active Site Jobs

We're actively hiring for scientists and operators!

We especially want to find a Head of Ops to help build an engine to repeat this study regularly and develop entirely new ones.

jobs.ashbyhq.com/activesite

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 0

Importantly: this is a snapshot of mid-2025 novice and LLM performance.

Results could change as new LLMs become more capable, easier to use in the lab, and as average elicitation skill improves.

As models evolve, we aim to continue tracking how people use frontier AI in biology.

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 1

How good were participants at using LLMs?

~40% of participants never uploaded images to LLMs.

Interestingly, both arms mentioned YouTube most often as helpful.

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 1

How reliable were LLMs in the hands of novices?

LLM transcripts revealed that models can still make mistakes, especially in molecular cloning.

LLMs led participants to move quicker (Panel A) but often not with the correct materials (Panel B).

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 1

It's hard to compress all that into a single statistic.

But one way is by using a Bayesian model, which suggests LLMs give a ~1.4x boost on a "typical" wet-lab task.

Fundamentally, we're confident that there wasn't a large LLM slow-down or speed-up (95% CrI: 0.7x–2.6x).

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 0

But there are some signs LLMs were useful.

LLM participants had higher success on 4 out of 5 tasks, most notably in cell culture (69% vs. 55%; P = 0.06).

LLM participants also advanced further within a task even if they didn't finish within the study period (odds >80%).

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 0

Our primary outcome: were LLM users more likely to complete all three of the core tasks *together*?

Only ~5% of the LLM arm and ~7% of the Internet arm completed all three.

No significant difference – and far lower than experts predicted.

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 1

The study was the largest and longest of its kind: 153 participants with minimal lab experience over 8 weeks – randomized to LLM and Internet-only.

They tried 5 laboratory tasks, 3 of which are central to a viral reverse genetics workflow. No protocols given — just an objective.

19.02.2026 17:38 👍 2 🔁 0 💬 1 📌 0

We ran a randomized controlled trial to see if LLMs can help novices perform molecular biology in a wet-lab.

The results: LLMs may help in some aspects, but we found no significant increase at the core tasks end-to-end. That's lower than what experts predicted.

Our findings 🧵

19.02.2026 17:37 👍 17 🔁 5 💬 1 📌 3

Active Site

Latest posts by Active Site @activesite.bio