Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?
Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?
We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.
Come do a PhD internship with me!
Was incredible to work with Arushi during her summer internship - check out her preprint showing that single cell representation learning methods for microscopy can be confounded by cells in the background of their inputs!
Read our preprint that demonstrates a growing bias in medical imaging datasets against pediatrics, and how it impacts downstream AI development!
Props to our first author @stanhua.bsky.social, who among many other feats, filtered hundreds of datasets/papers to establish this trend.
At NeurIPS and interested in talking about BioML? Catch me at the Microsoft booth in the expo hall between 2:00-2:30p and 3:30-4:30p today!