Hey @araffin.bsky.social sorry, I missed your question. "Coming soon" was a but optimistic, I got caught up in other projected which I did not expect... We will release the XQC code in the next few weeks! We won't do a seperate CrossQ+WN code release though.
03.02.2026 13:46
π 2
π 0
π¬ 0
π 0
π Really excited, our paper "XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning" has been accepted at #ICLR2026.
If you are interested in reinforcement learning, sample-efficiency, compute-efficiency go check it out. See you in Rio!
03.02.2026 10:33
π 10
π 3
π¬ 0
π 0
NVIDIA Awards up to $60,000 Research Fellowships to PhD Students
The Graduate Fellowship Program announced the latest awards of up to $60,000 each to 10 Ph.D. students involved in research that spans all areas of computing innovation.
I'm super excited to have been named an #NVIDIA Graduate Fellowship Finalist! π
Huge thanks to my supervisor @jan-peters.bsky.social and all my collaborators.
Can't wait to join the NVIDIA Seattle Robotics Lab for my internship next summer! π€
blogs.nvidia.com/blog/graduat...
13.12.2025 16:26
π 5
π 3
π¬ 0
π 0
Had a really great time presenting our #NeurIPS paper at the poster session today. Thanks to everyone who stopped by.
If you are interested in sample-efficient #RL, check out our work:
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
06.12.2025 05:44
π 4
π 1
π¬ 1
π 0
Read the full preprint here:
π arxiv.org/pdf/2509.25174
Code coming soon.
Weβd love feedback & discussion! π¬
02.10.2025 15:48
π 1
π 0
π¬ 0
π 0
Key takeaway:
Well-conditioned optimization > raw scale.
XQC proves principled architecture choices can outperform larger, more complex ones
02.10.2025 15:48
π 1
π 0
π¬ 1
π 0
π Results across 70 tasks (55 proprioception + 15 vision-based):
β‘οΈ Matches/outperforms SimbaV2, BRO, BRC, MRQ, and DRQ-V2
πΏ~4.5Γ fewer parameters and 1/10 FLOP/s of SimbaV2
πͺEspecially strong on the hardest tasks: HumanoidBench, DMC Hard & DMC Humanoids from pixels
02.10.2025 15:48
π 0
π 0
π¬ 1
π 0
This leads to XQC, a streamlined extension of Soft Actor-Critic with
β
only 4 hidden layers
β
BN after each linear layer
β
WN projection
β
CE critic loss
Simplicity + principled design = efficiency β‘οΈ
02.10.2025 15:48
π 0
π 0
π¬ 1
π 0
π Insight: A simple synergyβBatchNorm + WeightNorm + Cross-Entropy lossβmakes critics dramatically more well-conditioned.
β‘οΈResult: Stable effective learning rates and smoother optimization.
02.10.2025 15:48
π 1
π 0
π¬ 1
π 0
Instead of "bigger is better," we ask:
Can better conditioning beat scaling?
By analyzing the Hessian eigenspectrum of critic networks, we uncover how different architectural choices shape optimization landscapes.
02.10.2025 15:48
π 0
π 0
π¬ 1
π 0
π New preprint! Introducing XQCβ a simple, well-conditioned actor-critic that achieves SOTA sample efficiency in #RL
β
~4.5Γ fewer parameters than SimbaV2
β
Scales to vision-based RL
π arxiv.org/pdf/2509.25174
Thanks to Florian Vogt @joemwatson.bsky.social @jan-peters.bsky.social
02.10.2025 15:48
π 7
π 2
π¬ 1
π 1
Thanks to my co-authors Florian Vogt, @joemwatson.bsky.social @jan-peters.bsky.social
@hessianai.bsky.social @ias-tudarmstadt.bsky.social @dfki.bsky.social @cs-tudarmstadt.bsky.social
#RL #ML #AI
23.05.2025 12:50
π 4
π 1
π¬ 0
π 0
RLDM | The Multi-disciplinary Conference on Reinforcement Learning and Decision Making
If you're working on RL stability, plasticity, or sample efficiency, this might be relevant for you.
We'd love to hear your thoughts and feedback!
Come talk to us at RLDM in June in Dublin (rldm.org)
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
https://arxiv.org/abs/2502.07523v2
π TL;DR: We combine BN + WN in CrossQ for stable high-UTD training and SOTA performance on challenging RL benchmarks. No need for network resets, no critic ensembles, no other tricks... Simple regularization, big gains.
Paper: t.co/Z6QrMxZaPY
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
βοΈ Simpler β Weaker: Compared to SOTA baselines like BRO our method:
β
Needs 90% fewer parameters (~600K vs. 5M)
β
Avoids parameter resets
β
Scales stably with compute.
We also compare strongly to the concurrent SIMBA algorithm.
No tricksβjust principled normalization. β¨
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
π¬ The Result: CrossQ + WN scales reliably with increasing UTDβno more resets, no critic ensembles, no other tricks.
We match or outperform SOTA on 25 continuous control tasks from DeepMind Control Suite & MyoSuite, including dog π and humanoidπ§ββοΈtasks across UTDS.
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
β‘οΈ With growing weight norm, the effective learning rate decreases, and learning slows down/stops.
π‘Solution: After each gradient update, we rescale parameters to the unit sphere, preserving plasticity and keeping the effective learning rate stable.
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
π§ Key Idea: BN improves sample efficiency, but fails to reliably scale with complex tasks & high UTDs due to growing weight norms.
However, BN regularized networks are scale invariant w.r.t. their weights; yet, the gradient scales inversely proportional (Van Laarhoven 2017)
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
π Background: Off-policy RL methods like CrossQ (Bhatt* & Palenicek* et al. 2024) are sample-efficient but struggle to scale to high update-to-data (UTD) ratios.
We identify why scaling CrossQ failsβand fix it with a surprisingly effective tweak: Weight Normalization (WN). ποΈ
23.05.2025 12:50
π 0
π 0
π¬ 1
π 0
https://arxiv.org/abs/2502.07523v2
π New preprint "Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization"π€
We propose CrossQ+WN, a simple yet powerful off-policy RL for more sample-efficiency and scalability to higher update-to-data ratios. π§΅ t.co/Z6QrMxZaPY
#RL @ias-tudarmstadt.bsky.social
23.05.2025 12:50
π 7
π 1
π¬ 1
π 2
Check out our latest work, where we train an omnidirectional locomotion policy directly on a real quadruped robot in just a few minutes based on our CrossQ RL algorithm π
Shoutout @nicobohlinger.bsky.social, Jonathan Kinzel.
@ias-tudarmstadt.bsky.social @hessianai.bsky.social
19.03.2025 10:34
π 2
π 0
π¬ 0
π 0