Christian Dallago (@machine.learning.bio)

FLIP2: Expanding Protein Fitness Landscape Benchmarks FLIP2: A comprehensive benchmark for protein fitness prediction with 7 datasets, 16 splits, and real-world engineering scenarios

Find out more: flip.protein.properties

26.02.2026 21:58 👍 2 🔁 1 💬 0 📌 0

I'm especially happy about continuing to work with an amazing group of scientists. Thanks @kevinkaichuang.bsky.social , @kdidi.bsky.social , Bruce Wittmann, @kadinaj.bsky.social, Maya Czeneszew, @sarahalamdari.bsky.social, @alexijie.bsky.social, @thisismadani.bsky.social, ++

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

I love this gritty work here; there are no new architectures, no leaderboard-topping number to screenshot. However, it's how we (and hopefully the community) can measure whether the models we're all building and using are getting better where it counts — at the bench.

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

That's not a criticism to any method. New benchmarks are precisely needed to see where we are at, and set some target of where we could go from here.

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

Especially on the wild-type and position splits, current transfer learning doesn't consistently win. No single pLM architecture dominates. Scaling hasn't closed the gap yet.

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

The answer in 2026 is largely the same. Simple ridge regression on one-hot sequences, optionally supplemented with zero-shot pLM likelihoods, often matches or outperforms fine-tuned protein language models.

26.02.2026 21:58 👍 2 🔁 1 💬 1 📌 0

FLIP2, adds seven new sequence-fitness landscapes - industrial enzymes, nucleases, rhodopsins, protein-protein interactions - and 16 splits that test the generalization axes protein engineers really hit: more mutations, new positions, higher fitness, different wild-types.

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

We were interested in how things had changed 5 years after our first release. So, we built FLIP2 on select datasets from great labs across the world, many of which have gracefully agreed to make their data freely available.

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

FLIP spawned fast development of several different benchmarking efforts across protein design, engineering, and variant effect assesment.

The answer in 2021 was: sometimes, but simpler models hold up surprisingly well.

26.02.2026 21:58 👍 1 🔁 0 💬 1 📌 0

Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?

26.02.2026 21:58 👍 4 🔁 5 💬 2 📌 0

We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.

25.02.2026 21:25 👍 51 🔁 15 💬 1 📌 1

You can use the model right now to freely generate families for single sequence inputs (i.e., diversification conditioned by intrinsic representations of evolution), or to engineer proteins based on family promts (diversification by conditioning on particular evoluationary trajectories).

22.12.2025 14:57 👍 1 🔁 1 💬 0 📌 0

In essence, we probed the model's ability to ricapitulate family statistics, bootstrap protein structure prediction, and assess mutation effect, demonstrating excellent performance across all tasks, especially using test-time-scaling via prompt conditioning.

22.12.2025 14:57 👍 0 🔁 0 💬 1 📌 0

With ProFam-1, we scaled learning from single sequence to protein family definitions of different kinds, curating a large protein family corpus, ProtFam-atlas. I'm particularily stoked about the idea of inference-time-compute. This contribution laid out a very exciting path for future work.

22.12.2025 14:57 👍 0 🔁 0 💬 1 📌 0

Our latest protein family-based GenAI collection of tools and datasets, ProFam, is out now. Everything -- from data, training and inference code, to a 215M llama-based ProFam-1 are fully open sourced.

🧵

22.12.2025 14:57 👍 5 🔁 1 💬 2 📌 0

Tenure-Track Assistant Professor Position –AI/ML for Cell Biology - Durham, North Carolina (US) job with Duke University School of Medicine | 12844591 Tenure-Track Assistant Professor Position –AI/ML for Cell Biology

Another exciting opportunity, this time as a colleague at Duke! Join as tenure track assistant prof. in Cell Bio & let’s work on closing the gap between in-silico and in-vivo: www.nature.com/naturecareer...

Important: application closes Nov 1st!!!

27.10.2025 18:12 👍 3 🔁 1 💬 0 📌 0

Senior Applied Research Scientist, Multiscale Biology | NVIDIA Corporation Apply your expertise in engineering biology through algorithms and tools for genes, tissues, organisms, and populations. Conduct collaborative applied research in multiscale biology using deep learnin...

Another opening: Senior Multiscale Biology Applied Research Scientist!

nvidia.eightfold.ai/careers/job/...

Are fascinated by fundamental data modalities across biology like RNA-seq, mass spec & want to build computational tools that harnessing data to build intelligence?

Come: join the team!

17.10.2025 11:28 👍 4 🔁 1 💬 0 📌 0

I don’t dare question the HR gods about their designs :)

15.10.2025 21:23 👍 0 🔁 0 💬 1 📌 0

Senior Applied Research Scientist, Bioinformatics | NVIDIA Corporation Lead applied and collaborative research programs using bioinformatics, high performance computing, and deep learning for biological advancements. Develop and accelerate bioinformatics software and alg...

Are you passionate about leading collaborative, fast moving, applied bioinformatics research projects that help the entire community move forward?

Apply to work in my team at NVIDIA: nvidia.eightfold.ai/careers/job/...

15.10.2025 19:46 👍 5 🔁 0 💬 1 📌 0

I should add: structure prediction inference is INCREDIBLY efficient for the form factor and power profile.

15.10.2025 16:15 👍 0 🔁 0 💬 0 📌 0

It's an inference monster.

Structure prediction on it works.

Design to come.

This will be updated later today... research.nvidia.com/labs/dbr/ass...

15.10.2025 16:09 👍 3 🔁 0 💬 1 📌 0

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new pred...

Great talk by @machine.learning.bio at the 5th Virtual @chembiotalks.bsky.social. He talked about the use of #MachineLearning and #BigData to address biological questions. Cool insights into both predicting functions and designing proteins
ieeexplore.ieee.org/document/947...
arxiv.org/abs/2503.00710

30.09.2025 12:09 👍 6 🔁 2 💬 0 📌 0

GPU-accelerated homology search with MMseqs2 - Nature Methods Graphics processing unit-accelerated MMseqs2 offers tremendous speedups for homology retrieval from metagenomic databases, query-centered multiple sequence alignment generation for structure predictio...

GPU-accelerated MMseqs2 offers tremendous speedup for homology retrieval, protein structure prediction with ColabFold, and protein structure search with Foldseek. @martinsteinegger.bsky.social @milot.bsky.social @machine.learning.bio

www.nature.com/articles/s41...

18.09.2025 20:09 👍 81 🔁 21 💬 0 📌 0

From AlphaFold to MMseqs2-GPU: How AI is Accelerating Protein Science Podcast Episode · NVIDIA AI Podcast · 09/10/2025 · 35m

podcasts.apple.com/us/podcast/f...

11.09.2025 09:49 👍 10 🔁 3 💬 0 📌 1

Looking forward to hearing about the potential of machine learning for #Biology and #DrugDiscovery from an industry perspective. Register for the Virtual @chembiotalks.bsky.social to hear the perspective of Chris Dallago (@machine.learning.bio) from Nvidia.
#ChemBio #Chemsky #ML #MachineLearning

16.07.2025 07:53 👍 10 🔁 5 💬 0 📌 0

I still feel criminal for the handle but the manuscript embodies it well.

16.07.2025 23:53 👍 2 🔁 0 💬 0 📌 0

Moore’s law applied to speed not accuracy. I don’t think fundamentally the discoveries we are after are entirely dependent on speed.

I think the better law here is garbage in garbage out.

In that sense, you can wait for better data/curation, but it’s also fun to take destiny in your own hands :)

15.07.2025 11:14 👍 2 🔁 0 💬 1 📌 0

Computational exploration of global venoms for antimicrobial discovery with Venomics artificial intelligence - Nature Communications Researchers used artificial intelligence to mine global venom proteomes and discovered novel peptides with antimicrobial activity. Several candidates showed efficacy against drug-resistant bacteria in...

(1/5) Venoms are a vast, largely untapped library of bioactive molecules—and our new paper in @natcomms.nature.com ‬ @natprot.nature.com reveals just how powerful they can be. 🐍⚡️

14.07.2025 14:34 👍 4 🔁 3 💬 1 📌 0

Excited to have participated in the 2025 Symposium on Generative AI in Molecule Discovery in beautiful Munich, along with amazing scientists and colleagues @machine.learning.bio, Francesca Grisoni, ‪@ewaszczurek.bsky.social‬‬, @fabiantheis.bsky.social and more... 🔬🤖 events.hifis.net/event/2015/

11.07.2025 07:52 👍 3 🔁 1 💬 0 📌 1

Folddisco finds similar (dis)continuous 3D motifs in large protein structure databases. Its efficient index enables fast uncharacterized active site annotation, protein conformational state analysis and PPI interface comparison. 1/9🧶🧬
📄 www.biorxiv.org/content/10.1...
🌐 search.foldseek.com/folddisco

07.07.2025 08:21 👍 153 🔁 70 💬 8 📌 3

Christian Dallago

Latest posts by Christian Dallago @machine.learning.bio