New lab preprint: if ML struggles with extrapolation, let's expand the diversity of training data with gene synth, DNA shuffling, and ML gen.... also lots of #FluorescentProteins #ProteinEngineering #MachineLearning #SyntheticBiology 1/n
New lab preprint: if ML struggles with extrapolation, let's expand the diversity of training data with gene synth, DNA shuffling, and ML gen.... also lots of #FluorescentProteins #ProteinEngineering #MachineLearning #SyntheticBiology 1/n
High Diversity Gene Libraries Facilitate Machine Learning Guided Exploration of Fluorescent Protein Sequence Space https://www.biorxiv.org/content/10.64898/2026.03.01.706892v1
Iβm looking to hire a research technician for my lab at Harvard & DFCI, who would primarily work in the wet lab expressing and characterizing designed proteins, starting this summer. A great role for a recent college grad looking for an immersive research experience before grad school.
Hereβs a preprint from the Keating Lab: www.biorxiv.org/content/10.6.... Foster Birnbaum and Amy E. Keating demonstrate that sequence design models, such as ProteinMPNN, are limited because they were trained only on native sequences.
Yeah this seems terrible. On the other hand, if funding levels stay ~constant, I wonder if this will be undone in the future, leading to a kind of random jump up in the number of projects supported for a few years (far above even 2024) in ~2030? Not that that's much to get excited about.
Nope, just trying to force some honesty from the side that introduced a very dishonest mechanism of cutting budgets. Fruitless I know
I know negotiating with a bunch of bad-faithed liars can be a waste of time, but has anyone proposed a compromise where we switch to MYF but allocate extra funds so that the transition is net neutral to the # of awards? Seems like that would clarify what we're supposedly debating here.
Gonna be funny when people get rejected for being at the wrong career stage (like the K99)- βPI is WAY overqualified for retirementβ. βIt is not clear PI will benefit from additional research prior to retirementβ. βPI not qualified for retirement, hasnβt graduated enough students to be competitive.β
Congratulations!!
The final Calvin and Hobbes, which appeared in papers 30 years ago today.
Large scale prospective evaluation of co-folding across 557 Mac1-ligand complexes and three virtual screens
Figure 1
Figure 2
Figure 3
Large scale prospective evaluation of co-folding across 557 Mac1-ligand complexes and three virtual screens [new]
Mac1 co-folding eval: Pose prediction, conform. change, & hit ID post-training.
The most precious commodity you have is your attention. You donβt have to waste it on poor-faith debates or arguments with strangers if you donβt think theyβll be productive. You can prioritize the things that matter to you and make your life richer.
Excited to share our newest manuscript on antigen discovery and vaccine design for tuberculosis lead by Owen Leddy (everyone hiring new faculty in a few years, remember this name!)
www.science.org/doi/10.1126/... #TBsky
I don't know yet- clearly something has to do with strength of intermolecular interactions (e.g. the Arg effect) but also I bet something has to do with refolding rates. Although surprisingly we don't see much correlation with topology (% alpha vs % beta) so maybe not folding rates... ???
That Savas lab looks cool, but this was actually done in collaboration with Jeff Savas's lab at Northwestern, who is not on bsky that I can find :) thanks for the highlight!
Thank you so much :)
Thanks! Yeah, lots to learn!
Thanks!!
I hope this is useful and we're excited to use this approach to explore more protein stresses- shelf life, vortexing, freeze-thaw, etc. I also love this paper because it's our lab's first time using quantitative proteomics! Congrats Cydney et al!!
All data are available on Zenodo and we'd love to see what you can do with it! Cydney also made a nice notebook to run the predictive model (including DMS scan) in Colab!
Preprint: biorxiv.org/content/10.1...
Colab: colab.research.google.com/drive/1KNWvG...
Data: docs.google.com/forms/d/e/1F...
Thereβs much more in the paper, including the surprising observation that higher folding stability correlates with aggregation even controlling for hydrophobicity (?!) Stabilities measured by Kotaro Tsuboyama!
Existing aggregation predictors show only modest correlations with our results, but fine tuning SaProt does well on our (carefully split) test set!
Arginine is strongly aggregation promoting! This was known in literature but not to me.
Hydrophobicity and isoelectric point (pI) are correlated with temp- and pH-propensity, but donβt tell the whole story. Other factors also influenced aggregation.
Really cool result: changes in soluble abundance from having all protein domains mixed together are nicely correlated with similar experiments on single, purified domains (also at 10 mg/mL).
Overall, 25-50% of the total protein became insoluble, but the proteomics revealed which proteins aggregated (became insoluble) more and which aggregated less.
We expressed the protein domains together as a mixture in one E. coli culture, purified our mixture of domains and concentrated to 10mg/mL, then used TMT proteomics to monitor the change in soluble protein abundance after high temp/low pH.
This work was led by the incredible Cydney Martell, who worked out the entire approach from scratch and did virtually all the experiments and computational analysis (and earned her own PhRMA fellowship)! phrmafoundation.org/grants-fello... With proteomics help from the Savas Lab at NU!
New preprint! We measured temperature- and pH-induced aggregation for over 18,000 natural and de novo designed protein domains!
Global Analysis of Aggregation Determinants in Small Protein Domains https://www.biorxiv.org/content/10.1101/2025.11.11.687847v1