John Gardner (@jla-gardner)

Read more about our MLIP distillation preprint in John’s thread! 🙂 #compchem #chemsky

23.06.2025 17:23 👍 12 🔁 2 💬 0 📌 0

Thanks! 😊 In principle yes - our data generation protocol requires ~5 model calls to generate a new, chemically reasonable structure + is easily parallelised across processes. If you were willing to burn $$$ one could generate a new dataset very quickly (as is often the case with e.g. RSS set ups)

24.06.2025 05:53 👍 3 🔁 0 💬 0 📌 0

... @juraskova-ver.bsky.social, Louise Rosset, @fjduarte.bsky.social, Fausto Martelli, Chris Pickard and @vlderinger.bsky.social 🤩

23.06.2025 14:12 👍 1 🔁 0 💬 0 📌 0

It was super fun collaborating with my co-first-author @dft-dutoit.bsky.social, together with the rest of the team across various research groups: @bm-chiheb.bsky.social, Zoé Faure Beaulieu, Bianca Pasça...

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

We hope that you can start using this method to do cool new science!

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

GitHub - jla-gardner/augment-atoms: dataset augmentation for atomistic machine learning dataset augmentation for atomistic machine learning - jla-gardner/augment-atoms

Code for our synthetic generation pipeline (compatible with any ase Calculator object) can be found here:
github.com/jla-gardner/...

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Distillation of atomistic foundation models across architectures and chemical domains Machine-learned interatomic potentials have transformed computational research in the physical sciences. Recent atomistic `foundation' models have changed the field yet again: trained on many differen...

For (many) more results, please see the pre-print! arxiv.org/abs/2506.10956

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

I find our results for the modelling of MAPI (a hybrid perovskite) particularly pleasing: the distribution of cation orientations generated by the teacher and student models during NVT MD are ~identical!

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

We go on to apply this distillation approach to target other chemical domains by distilling different foundation models (Orb, MatterSim @msftresearch.bsky.social) and MACE-OFF), and find that it works well across the board!

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Beyond error metrics, we extensively validate these models to show they model liquid water well.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

These student models have relatively few parameters (c. 40k for PaiNN and TensorNet), and so have much lower memory footprint. This lets you scale single GPU experiments very easily!

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

The resulting student models reach impressive accuracy vs DFT while being orders of magnitude faster than the teacher!

Note that these student models are of a different architecture to MACE, and in fact ACE is not even NN-based.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

We start by (i) fine-tuning MACE-MP-0 (@ilyesbatatia.bsky.social) on 25 water structures labelled with an accurate functional, (ii) using this fine-tuned model and structures to generate a large number (10k) new “synthetic” structures, and (iii) training student models on this dataset.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Does this distillation approach work? In short, yes! 🤩

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

This approach is very cheap, taking c. 5 calls to the teacher model to generate a new, chemically relevant and uncorrelated structure! We can build large datasets within one hour using this protocol.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

In this pre-print, we propose a different solution: starting from a (very) small pool of structures, and repeatedly (i) rattling and (ii) crudely relaxing them using the teacher model and a Robbins-Monro procedure.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

This works well, but has two drawbacks: (1) MD is still quite expensive, and requires many steps to generate uncorrelated structures, and (2) expert knowledge and lots of fiddling is required to get the MD settings right.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

In previous work, we and others (PFD-kit) have proposed using teacher models to generate "synthetic data" by using them to drive MD, and to sample snapshots along these trajectories as training points.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

The devil is always in the details however 😈 The main problem we need to solve is how to generate many relevant structures that densely sample the chemical domain we are interested in targeting.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

At a high level, this builds upon the approach pioneered by Joe Morrow, now extended to the distillation of impressively capable foundation models, and to a range of downstream architectures and chemical domains,

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Concretely, we train a student to predict the energy and force labels generated by the teacher on a large dataset of structures: this requires no alterations to existing training pipelines, and so is completely agnostic to the architecture of both the teacher and student 😎

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Both of the above methods try to maximise the amount of information extracted per training structure from the teacher. Our approach is orthogonal to this: we try to maximise the number of structures (that are both sensible and useful) we use to transfer knowledge.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Somewhat similarly,
@ask1729.bsky.social
and others extract additional Hessian information from the teacher. Again, this works well providing you have a training framework that lets you train student models on this data.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

@gasteigerjo.bsky.social
and others attempt to align not only the predictions, but also the internal representations of the teacher and the student. This approach works well for models with similar architectures, but is incompatible with e.g. fast linear models like ACE.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

At their heart, all model distillation strategies attempt to extract as much information as possible from the teacher model, in a format that is useful for the student.

Various existing methods in the literature do this in different ways.

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

In the context of machine learned interatomic potentials, distillation lets you simulate larger systems, for longer times, and with less compute.
This lets you explore new science, and democratises access to otherwise expensive simulations/methods and foundation models. 💪

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Model distillation methods take a large, slow “teacher model”, and try to condense, or “distill”, its knowledge into a smaller, faster “student model”

If this can be done well, it is an extremely useful thing!

23.06.2025 14:12 👍 0 🔁 0 💬 1 📌 0

Excited to share the pre-print we’ve been working on for the last ~4 months:

“Distillation of atomistic foundation models across architectures and chemical domains”

Deep dive thread below! 🤿🧵

23.06.2025 14:12 👍 11 🔁 2 💬 3 📌 1

graph-pesContentsMenuExpandLight modeDark modeAuto light/dark, in light modeAuto light/dark, in dark mode

If you want to dive straight in with some hands-on examples, you can find several links to Colab notebooks in the graph-pes docs:
jla-gardner.github.io/graph-pes/

22.04.2025 16:35 👍 0 🔁 0 💬 0 📌 0

GitHub - jla-gardner/graph-pes: train and use graph-based ML models of potential energy surfaces train and use graph-based ML models of potential energy surfaces - jla-gardner/graph-pes

If graph-pes sounds interesting to you, please do check out the repo and give it a star!
Please also reach out via GitHub issues or DM on here if you have any questions or feedback.
github.com/jla-gardner/...

22.04.2025 16:35 👍 1 🔁 0 💬 1 📌 0

John Gardner

Latest posts by John Gardner @jla-gardner