π pubs.acs.org/doi/10.1021/...
Preprint: arxiv.org/abs/2508.12629
Code: github.com/Dunni3/FlowMol
The performance gains over previous FlowMol versions are due to 3 techniques which are cheap and architecture agnostic. We hypothesize that these techniques operate synergistically to reduce a common pathology in transport-based generative models.
I'm excited to share FlowMol3! The 3rd (and final) version of our flow matching model for 3D de novo, small-molecule generation. FlowMol3 achieves state of the art performance over a broad range of evaluations while having β10x fewer parameters than comparable models.
Our new preprint PharmacoForge: Pharmacophore Generation with Diffusion Models is out now! PharmacoForge quickly generates pharmacophores for a given protein pocket that identify key binding features and find useful compounds in a pharmacophore search. Check it out! π§ͺ doi.org/10.26434/che...
New "blogpost" from our lab, that got accepted at ICLR 2025! We compare an old MCMC method known as Sequential Monte Carlo to generative models trained on energy functions (iDEM/iEFM) and show that MCMC does better. Check it out here: rishalaggarwal.github.io/ebmvsmcmc/
Structural biology is in an era of dynamics & assemblies but turning raw experimental data into atomic models at scale remains challenging. @minhuanli.bsky.social and I present ROCKETπ: an AlphaFold augmentation that integrates crystallographic and cryoEM/ET data with room for more! 1/14.
Thank you!
Thanks Pat!
MLSB + the AI4Science field are clearly outgrowing the ML conference workshop format
FlowMol at your fingertips! We just released a colab notebook to make using FlowMol super easy. Come chat with us tomorrow at @workshopmlsb ! #NeurIPS2024 π§ͺ colab.research.google.com/github/Dunni...
Thanks Alex!
Our work is fully open-source and we invite feedback from the community. Code is available here: github.com/Dunni3/FlowMol
This opens a new set of questions, gives researchers a new way to quantify molecule quality, and the ability to test hypotheses as we further push de novo models to more faithfully match the distribution of real molecules.
But that's not the whole story. We introduce methods to quantify molecule quality at the level of functional groups + ring systems. "Valid" generated molecules tend to contain significantly more reactive functional groups than in the training data.
We test a handful of discrete flow matching methods for 3D de novo molecule design and provide some explanations for their differing performance. The result of this is a version of FlowMol with CTMC flows that achieves SOTA validity with fewer learnable parameters.
I'm presenting a new paper "Exploring Discrete Flow Matching for 3D De Novo Molecule Generation" at @workshopmlsb.bsky.social this week! More info in this thread but reach out if want to chat at NeurIPS about generative models or molecular design. arxiv.org/abs/2411.16644
Congrats!
Our paper describing our winning submission (tied with @olexandr.bsky.social) is out with some extra computational analysis of the predicted binding modes. We didn't do anything fancy (but the hits weren't that great either...).
pubs.acs.org/doi/10.1021/...
formal post coming soon :p
Here is how Boltz-1 (green), DynamicBind (magenta), and GNINA (blue) dock a collection of random molecules. GNINA, using a classical sampling algorithm (MCMC) hits all concave regions while the ML samplers have distinct preferences. Boltz is the most likely to induce a fit.