Active Site's Avatar

Active Site

@activesite.bio

Measuring frontier AI in synthetic biology

12
Followers
10
Following
12
Posts
18.02.2026
Joined
Posts Following

Latest posts by Active Site @activesite.bio

Thank you to the Frontier Model Forum, Sentinel Bio, and @packardfdn.bsky.social for supporting our work and to our advisory board.

19.02.2026 17:38 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Shout out to Shen Zhou Hong, @alex-kleinman.bsky.social, Alyssa Mathiowetz, @adamhowes.bsky.social, @xrg.bsky.social, @lucarighetti.bsky.social, Joe Torres, Julian Cohen, Suveer Ganta, Deepika Pahari, Alex Letizia

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to...

You can read more here:

๐Ÿ“ Blog post: activesite.substack.com/p/rct
๐Ÿ“„ arXiv Preprint: arxiv.org/abs/2602.16703
๐Ÿ”ฎ Predictions from @research-fri.bsky.social: forecastingresearch.substack.com/p/how-well-...

19.02.2026 17:38 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Active Site Jobs Active Site Jobs

We're actively hiring for scientists and operators!

We especially want to find a Head of Ops to help build an engine to repeat this study regularly and develop entirely new ones.

jobs.ashbyhq.com/activesite

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Importantly: this is a snapshot of mid-2025 novice and LLM performance.

Results could change as new LLMs become more capable, easier to use in the lab, and as average elicitation skill improves.

As models evolve, we aim to continue tracking how people use frontier AI in biology.

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

How good were participants at using LLMs?

~40% of participants never uploaded images to LLMs.

Interestingly, both arms mentioned YouTube most often as helpful.

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

How reliable were LLMs in the hands of novices?

LLM transcripts revealed that models can still make mistakes, especially in molecular cloning.

LLMs led participants to move quicker (Panel A) but often not with the correct materials (Panel B).

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

It's hard to compress all that into a single statistic.

But one way is by using a Bayesian model, which suggests LLMs give a ~1.4x boost on a "typical" wet-lab task.

Fundamentally, we're confident that there wasn't a large LLM slow-down or speed-up (95% CrI: 0.7xโ€“2.6x).

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

But there are some signs LLMs were useful.

LLM participants had higher success on 4 out of 5 tasks, most notably in cell culture (69% vs. 55%; P = 0.06).

LLM participants also advanced further within a task even if they didn't finish within the study period (odds >80%).

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Our primary outcome: were LLM users more likely to complete all three of the core tasks *together*?

Only ~5% of the LLM arm and ~7% of the Internet arm completed all three.

No significant difference โ€“ and far lower than experts predicted.

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

The study was the largest and longest of its kind: 153 participants with minimal lab experience over 8 weeks โ€“ randomized to LLM and Internet-only.

They tried 5 laboratory tasks, 3 of which are central to a viral reverse genetics workflow. No protocols given โ€” just an objective.

19.02.2026 17:38 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

We ran a randomized controlled trial to see if LLMs can help novices perform molecular biology in a wet-lab.

The results: LLMs may help in some aspects, but we found no significant increase at the core tasks end-to-end. That's lower than what experts predicted.

Our findings ๐Ÿงต

19.02.2026 17:37 ๐Ÿ‘ 17 ๐Ÿ” 5 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 3