Data augmentation (DA) emerges from LLoCa as the special case of random global frames, enabling a fair comparison between equivariance and augmentation. Equivariance excels in large-data regimes due to greater expressivity, while augmentation wins for little data.
6/6
We create LLoCa-ParticleNet and LLoCa-ParT, Lorentz-equivariant versions of the established non-equivariant ParticleNet and ParT. The LLoCa variants consistently improve performance but are 2× slower. Interestingly, we find that a simple LLoCa-Transformer matches the LLoCa-ParT performance.
5/6
Existing Lorentz-equivariant architectures like LorentzNet, PELICAN, and L-GATr rely on specialized layers for internal representations, limiting architectural choice and often requiring significant extra compute. LLoCa achieves similar (SOTA) performance while being 4× faster and more flexible.
4/6
All in all, it takes two steps to make your architecture Lorentz-equivariant:
(1) use a small network that equivariantly predicts local frames, and express inputs in these local frames.
(2) add frame-to-frame transformations in the message passing (or attention) of your backbone architecture.
3/6
LLoCa assigns equivariantly predicted local reference frames to each particle, making their features invariant such that we can process them with any backbone architecture. This approach supports general internal representation through the way how messages are transformed between local frames.
2/6
Lorentz Local Canonicalization (LLoCa) is a drop-in replacement that makes any network Lorentz-equivariant. Check out how we apply it to high-energy physics tasks in arxiv.org/abs/2505.20280.
w/ Luigi Favaro, Peter Lippmann, Sebastian Pitz, Gerrit Gerhartz, Tilman Plehn, and Fred A. Hamprecht
1/6
The DiscFormer training is similar to GANs, but requires neither a joint training nor a back-and-forth between classifier and generator. Unfortunately we did not get it to consistently improve upon standard likelihood training after working on it for over a year...
7/7
Finally, an interesting but null result:
Appendix A is on a novel way to amplify likelihood training with classifier reweighting, aka DiscFormer. To avoid a classifier unweighting step after training, we reweight training data to increase the difference between model and data, aka DiscFormation.
6/7
We try bootstrapping and two modified loss functions to tackle this task. We find that all three methods generate significantly more events with 8 jet. Plus, they get the kinematics correct at the level of statistical uncertainty in the training data. Yay!
5/7
However, we find that events with 8 jets are much less likely to be generated. Can we find a way to modify the training process to increase the fraction of events with many jets?
4/7
We train an autoregressive transformer on events with up to 6 jets. The model does not learn the multiplicity distribution perfectly, therefore it also generates a few accidental 7 jet events. This happens rarely, but we find that these events roughly have the correct kinematic distributions.
3/7
QCD jet radiation follows a universal scaling pattern, reflecting the collinear factorization of matrix element and phase space. However, Later parts of the simulation chain violate this universality. It remains approximately valid, manifesting in the staircase scaling of jet multiplicities.
2/7
Can transformers learn the universal patterns of jet radiation and extrapolate beyond the training data?
Find out in our preprint
'Extrapolating Jet Radiation with Autoregressive Transformers'
arxiv.org/abs/2412.12074
w/ Javi Marino, Ayo Ore, Francois Charton, Anja Butter and Tilman Plehn
1/7
On Thursday from 11:00 to 14:00, I'll be cheering on @jonasspinner.bsky.social and Victor Bresó at poster 3911.
They built L-GATr 🐊: a transformer that's equivariant to the Lorentz symmetry of special relativity. It performs remarkably well across different tasks in high-energy physics.
2/6
Thanks to the L-GATr team Victor Breso, Pim de Haan, Tilman Plehn, Huilin Qu, Jesse Thaler and @johannbrehmer.bsky.social
Looking forward to exciting discussions at NeurIPS!
We train a continuous normalizing flows with Riemannian flow matching and several choices for the vector field architecture, and compare them with our autoregressive density estimator 'JetGPT'. CNFs turn out to be more data-efficient, and turning them equivariant also helps.
6/7
For the first time, we have trained a Lorentz-equivariant architecture on a real-world tagging dataset (JetClass = 100M jets). We find the hierarchy GNN < transformer < Lorentz-equivariant transformer, indicating that equivariance also matters at scale.
5/7
We implement the L-GATr attention as a multiplicative list of signs for the queries in the inner product, and then use off-the-shelf attention kernels. WIth this trick, L-GATr scales to many tokens like standard transformers.
4/7
To build L-GATr, we replace each transformer module with a version that processes geometric algebra objects in a Lorentz-equivariant way. Plus, there is a new operation in geometric algebra that allows us to add an extra layer, the geometric product.
3/7
The Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) uses spacetime geometric algebra to process particles at the LHC in a Lorentz-equivariant way. We process them using a transformer architecture, combining the benefits of Lorentz and permutation equivariance.
2/7