Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?
Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?
Remote homology and protein design: two sides of the same coin. Instead of finding remote homologs, we used TEA to design completely de novo proteins, folding into desired TEA sequences.
I always love working with Jay, and βspeed-runningβ this proof of concept was no exception.
Also a great time to showcase @lorenzopantolini.bsky.social's awesomeness as he slowly starts the job hunt! If you need someone with a deep understanding of biological latent spaces and how to exploit them for practical applications, he's your guy.
This was a speed-run to validate the in silico proof-of-concept, but the possibilities are endless. It may represent a path orthogonal to current structure-based methods. We're working on adaptations and, of course, looking to experimentally validate. (8/n)
Previous MCMC works used contact map loss (needed ~170k steps, Verkuil et al. 2022) or ESMFold pTM (i.e folding at every step, Hie et al. 2022). By optimising a 1D sequence with TEA, we see a >10x speed increase. (7/n)
For unconditional design, we get high-pLDDT proteins unlike any known sequences. A small TEA k-mer diversity loss helped steer us away from simple coiled-coils toward complex secondary structure combos. (6/n)
For template-guided design, we generated novel sequences predicted to fold into both de novo and natural scaffolds (AF2 single seq). Many have a NEFF of 1. No structures were used in the making of these designs. (5/n)
The approach:
1. Take a random sequence
2. Randomly mutate
3. Accept/reject via Metropolis criterion based on ESM2 likelihood + TEA template match (or TEA entropy if unconditional).
This is fast, 30k steps in ~25min. (4/n)
We noticed that TEA logit entropy correlates well with structure prediction confidence (pLDDT). Ideally, we could combine the ESM2 likelihood (naturalness) with TEA (structural consistency) to guide design. (3/n)
We recently released The Embedded Alphabet (TEA), a tiny head on top of ESM2 converting amino acids into a new 20-letter structural alphabet. Great for search (see bsky.app/profile/lore...), but we wondered: could we use it for generation? (2/n)
A fun little idea that worked surprisingly well, using a structure-informed yet structure-independent alphabet for de novo protein design: www.biorxiv.org/content/10.6...
π§΅(1/n)
My time in @martinsteinegger.bsky.social's group is ending, but Iβm staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/βco-foldingβ methods with 2 new stringent performance tests.
Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...
Preprint:
www.biorxiv.org/content/10.6...
Thanks a lot for the review! We somehow missed it until quite recently but I think addressed a good chunk of the comments in revision anyway, looking forward to your thoughts when it's out
π New paper in @natmethods.nature.com!
We present OpenStructure's powerful scoring capabilities, used to assess predictionsin CAMEO and CASP.
Read the full study here:
π doi.org/10.1038/s415...
#StructuralBiology #Bioinformatics #OpenStructure #CASP #CAMEO #ProteinStructure
Been excited about this one for a while! What would you do with a new alphabet and the wealth of protein sequence bioinformatics at your disposal? We're also around at #EMBOComp3D Heidelberg and MLSB Copenhagen this week to discuss
OpenFold3-preview (OF3p) is out: a sneak peek of our AF3-based structure prediction model. Our aim for OF3 is full AF3-parity for every modality. We now believe we have a clear path towards this goal and are releasing OF3p to enable building in the OF3 ecosystem. Moreπ
This October Iβm drawing one molecule a day inspired by proteins in pdb @rcsbpdb.bsky.social
Day 2/31
Prompt WEAVE
N-terminal domain of a Fibrion - a building block of silk fiber produced by silkworms.
Pdb: 3UA0
Next prompt is CROWN and I would love your suggestions!
Viral AlphaFold Database (VAD) is live in Science Advances
~27,000 predicted viral protein monomers & homodimers
Conserved folds across bacteria, archaea & eukaryotic viruses
New toxinβantitoxin system KreTA uncovered
Vast βfunctional darknessβ remains uncharted
www.science.org/doi/10.1126/...
OcΓ©ane Follonierβ @oceanef.bsky.social for
βFrom bytes to binders: design, score and optimizeββ #bc2basel #posterprize
Critical benchmarking of structure prediction methods has been crucial for measuring progress and detecting breakthroughs. But how will the future look like? Join the discussion at our workshop in Basel on September 8 - just before the [BC]2 conference.
@sib.swiss @biozentrum.unibas.ch
β¬οΈβ¬οΈβ¬οΈ
Exciting to see our protein binder design pipeline BindCraft published in its final form in @Nature ! This has been an amazing collaborative effort with Lennart, Christian, @sokrypton.org, Bruno and many other amazing lab members and collaborators.
www.nature.com/articles/s41...
Still some spots left, join us in Basel on Sep 8 (before [BC]2) to discuss structure prediction benchmarking and more!
π¬ Workshop: Future of Structure Prediction Benchmarking
π
Sept 8, 2025 | Basel
π‘ Talks + breakout sessions on #CASP #CAPRI #CAMEO & benchmarking for drug discovery
ποΈ Free registration (limited spots): lu.ma/ws9nu1xf
Join us to explore how benchmarking can drive breakthroughs in structure prediction.
Join us at #EMBOComp3D to explore cutting-edge breakthroughs in computational structural biology, AI, drug design, and innovative software! π»
Find out about molecular modelling to systems-level analyses and evolution, and more.
Submit your abstract by 26 Aug β‘οΈ s.embl.org/csb25-01-bl
CATH turns 30 years old this year!
We are organising a 1-day symposium on September 16th at UCL, highlighting recent AI-based developments to enhance protein family classifications, annotations and analyses.
www.eventbrite.co.uk/e/protein-an...
AtomWorks is out! Building upon @biotite_python, we built a toolkit for all things biomolecules and trained RF3 with it. All open-source, test it via `pip install atomworks`!
AtomWorks: github.com/RosettaCommo...
RF3: github.com/RosettaCommo...
Paper: tinyurl.com/y2w4z65b
1/6
Scatter plot of LDDT as a function of sequence identity for high coverage homology models. The horizontal red line at 40% sequence identity highlights the presence high quality models in the low sequence identity region.
Filtering out homologous structures from the PDB at 40% sequence identity is not enough to create a robust test set. Significant leakage persists at this level, and comparative modeling can still produce high quality models.
Looking for a #fellowship for an independent #PhD at one of the best places for life sciences in the world?
The summer call at @biozentrum.unibas.ch @unibas.ch is open until October 12, 2025.
www.biozentrum.unibas.ch/phd/internat...