Just released! Fluorescent Protein Libraries from the Plesa Lab. Benchmarking protein expression, developing imaging tools, fluorescent protein engineering, and more.
www.addgene.org/pooled-libra...
Just released! Fluorescent Protein Libraries from the Plesa Lab. Benchmarking protein expression, developing imaging tools, fluorescent protein engineering, and more.
www.addgene.org/pooled-libra...
To enable reuse by the research community we're making the two parental FPBase libraries available on @addgene.bsky.social as pooled plasmid libraries (Addgene #245482 #245483). Use them. Shuffle them. Train on them! addgene.org/Calin_Plesa/ 12/n
So if you want ML to explore distant optima, donβt just improve the model. Engineer the training distribution. Create diversity experimentally. Validate it functionally. Then let the model interpolate across it. 11/n
ML-generated functional proteins show more mosaic structure than even the shuffled library. The model learns to recombine sequence segments across parental families in new ways (remixing at scale) 10/n
Diversity metrics (clustering, k-mers, nearest-neighbor identity, mosaic structure, embedding geometry) show:
1) DNA shuffling expands dispersion 2) FACS constrains but preserves breadth 3) ML redistributes occupancy across new regions 9/n
We synthesized and screened them with hundreds of ML-generated proteins showing reproducible blue fluorescence. Some extend far beyond known natural clusters, including variants with <30% identity to any
@fpbase.org protein. 8/n
We FACS sorted for blue fluorescence creating a high confidence 7,812 seq BFP training set. We fine tuned ProtGPT2 on this and generated 11,000 de novo sequences which were pruned for max diversity to set of 1,500 for synthesis. 7/n
Even after heavy recombination, the Ξ²-barrel scaffold proved surprisingly robust. The shuffled library retained substantial fluorescence while increasing diversity 3x relative to parents. We expanded the manifold without collapsing function. 6/n
We next generated large numbers of new chimeric variants, that bridge distant homologs by DNA shuffling across the entire parental set. This synthetic recombination expanded sequence dispersion. 5/n
Using Ξ²-barrel fluorescent proteins (FPs) as a model, we synthesized a large fraction of the known Ξ²-barrel FPs from
@fpbase.org (620 seqs), in two codon versions using DropSynth resulting in two large parental libraries spanning natural diversity. 4/n
We hypothesized that by experimentally expanding sequence diversity, we could convert extrapolation into interpolation by expanding the known manifold. 3/n
PLMs do best when interpolating within known sequence space. But many protein families are sparsely sampled. Also global fitness optima may reside in distant regions of sequence space far from well studied seqs. So many design problems are extrapolative. 2/n
New lab preprint: if ML struggles with extrapolation, let's expand the diversity of training data with gene synth, DNA shuffling, and ML gen.... also lots of #FluorescentProteins #ProteinEngineering #MachineLearning #SyntheticBiology 1/n
This year's UO iGEM team is building a histamine-responsive probiotic to help tame mast cell flares in MCAD. Help us get more students to the iGEM Jamboree in Paris this October (gifts via UO Foundation may be tax-deductible): duckfunder.uoregon.edu/project/46947
The BioE dept in the Knight Campus is a great place to work and is unique in many ways! We are expanding over the next few years with our second building opening in 2026. Come join us, applications begin review on October 15. ...4/n
Computational & Data-Science Neuroengineering: neural signal processing and analytics, closed-loop behavior tracking, neuromorphic computing and brain-inspired AI, spatial multi-omics analysis ...3/n
Experimental & Translational Neuroengineering: neural interfaces, neurophotonics for brain stimulation/recording/imaging, pre-clinical bioelectric medicine or brain injury repair models, brain organoid technology, disease models ...2/n
Knight Campus buildings
Our BioE dept at @uoknightcampus.bsky.social has an open rank search for two complementary tenure-track faculty positions in Neuroengineering focused on: 1) Experimental & Translational Neuroengineering and 2) Computational & Data-Science Neuroengineering.
careers.uoregon.edu/en-us/job/53... ...1/n
WORK!
I currently have 5 (!) open positions for postdocs/PhD students in our CDlab for projects:
- Nanopore protein sequencing
- Archaeal CDV cell division
- Microfluidics for synthetic cells
- Nuclear Pore Complex
- Origami mimics of peroxisomes
Please apply! ceesdekkerlab.nl/come-join-us/
RT=π
Our new Science Advances paper is out! π The Plesa Lab's first, it builds on DropSynth technology as a proof of concept for large-scale synthetic gene libraries, showcasing a synthetic metagenomics approach to studying antibiotic resistance at scale. π www.science.org/doi/10.1126/...
CAGT was fun! Thanks @carldeboer.bsky.social @sudpinglay.bsky.social and the de Boer lab for organizing! Folks from Seattle, Oregon, and other places. Great community. Arman gave a usual super talk, and Sanchit and Dayag won poster prizesβΊοΈ
Dr. Natanya Villegas PhD defense
Amazing PhD defense by Dr. Natanya Villegas! She pioneered CRISPR-Cas9 and RNA work in our lab and drove multiple tech-dev projects. She is looking for opportunities in the PNW so reach out -> @trienetoscience.bsky.social
Huge congrats to Dr. Andrew Holston on a successful PhD defense!! Andrew has been working on the large-scale characterization and engineering of chimeric receptor histidine kinases. Heβs looking for roles in biotech/academia, letβs connect if youβre hiring! -> @hkalltheway.bsky.social
Join us today, May 27, for @trienetoscience.bsky.social's PhD thesis defense. Natanya is a graduate student in @calin.bsky.social's lab, where her research has focused on innovations in programmable nucleic acid libraries and CRISPR enrichment for molecular biology applications. π§¬
Join us tomorrow, May 23, for Andrew Holstonβs PhD thesis defense. Andrew is a graduate student in @calin.bsky.socialβs lab, where his research has focused on the large-scale engineering of chimeric histidine kinases. π§ͺ
Have we hit a "scaling wall" for protein language models? π€ Our latest ProteinGym v1.3 release suggests that for zero-shot fitness prediction, simply making pLMs bigger isn't better beyond 1-4B parameters. The winning strategy? Combining MSAs & structure in multimodal models!
Preprint from the labπ¨
Have you ever engineered proteins to be more stable and were unhappy about your predictor's success rate? We got you covered with BoostMut!
Great work led by @kerlenkorbeld.bsky.social now online at www.biorxiv.org/content/10.1...
A thread π§΅
Added!
We unlock breakthrough insights by delivering the best training sets in biologyβat unprecedented scale. youtu.be/KuCaJTPHM2o
Join us June 19-20, 2025 in Vancouver BC for the Cascadia Advanced Genomic Technologies meeting! Featuring Keynote speaker Calin Plesa @calin.bsky.social Abstract deadline April 30π, but registration is capped so don't wait! π±
de-boer-lab.github.io/CAGT_meeting/