Daniel Palenicek's Avatar

Daniel Palenicek

@daniel-palenicek

PhD Researcher in Robot #ReinforcementLearning πŸ€–πŸ§  at IAS TU Darmstadt and hessian.AI advised by Jan Peters. Former intern at BCAI and Huawei R&D UK.

43
Followers
25
Following
21
Posts
12.02.2025
Joined
Posts Following

Latest posts by Daniel Palenicek @daniel-palenicek

Hey @araffin.bsky.social sorry, I missed your question. "Coming soon" was a but optimistic, I got caught up in other projected which I did not expect... We will release the XQC code in the next few weeks! We won't do a seperate CrossQ+WN code release though.

03.02.2026 13:46 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸŽ‰ Really excited, our paper "XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning" has been accepted at #ICLR2026.

If you are interested in reinforcement learning, sample-efficiency, compute-efficiency go check it out. See you in Rio!

03.02.2026 10:33 πŸ‘ 10 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
NVIDIA Awards up to $60,000 Research Fellowships to PhD Students The Graduate Fellowship Program announced the latest awards of up to $60,000 each to 10 Ph.D. students involved in research that spans all areas of computing innovation.

I'm super excited to have been named an #NVIDIA Graduate Fellowship Finalist! πŸ’š

Huge thanks to my supervisor @jan-peters.bsky.social and all my collaborators.

Can't wait to join the NVIDIA Seattle Robotics Lab for my internship next summer! πŸ€–

blogs.nvidia.com/blog/graduat...

13.12.2025 16:26 πŸ‘ 5 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Had a really great time presenting our #NeurIPS paper at the poster session today. Thanks to everyone who stopped by.

If you are interested in sample-efficient #RL, check out our work:

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

06.12.2025 05:44 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Read the full preprint here:
πŸ‘‰ arxiv.org/pdf/2509.25174
Code coming soon.
We’d love feedback & discussion! πŸ’¬

02.10.2025 15:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Key takeaway:
Well-conditioned optimization > raw scale.

XQC proves principled architecture choices can outperform larger, more complex ones

02.10.2025 15:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“Š Results across 70 tasks (55 proprioception + 15 vision-based):

⚑️ Matches/outperforms SimbaV2, BRO, BRC, MRQ, and DRQ-V2

🌿~4.5Γ— fewer parameters and 1/10 FLOP/s of SimbaV2

πŸ’ͺEspecially strong on the hardest tasks: HumanoidBench, DMC Hard & DMC Humanoids from pixels

02.10.2025 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This leads to XQC, a streamlined extension of Soft Actor-Critic with
βœ… only 4 hidden layers
βœ… BN after each linear layer
βœ… WN projection
βœ… CE critic loss

Simplicity + principled design = efficiency ⚑️

02.10.2025 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸ”‘ Insight: A simple synergyβ€”BatchNorm + WeightNorm + Cross-Entropy lossβ€”makes critics dramatically more well-conditioned.

➑️Result: Stable effective learning rates and smoother optimization.

02.10.2025 15:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Instead of "bigger is better," we ask:
Can better conditioning beat scaling?

By analyzing the Hessian eigenspectrum of critic networks, we uncover how different architectural choices shape optimization landscapes.

02.10.2025 15:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸš€ New preprint! Introducing XQCβ€” a simple, well-conditioned actor-critic that achieves SOTA sample efficiency in #RL
βœ… ~4.5Γ— fewer parameters than SimbaV2
βœ… Scales to vision-based RL
πŸ‘‰ arxiv.org/pdf/2509.25174

Thanks to Florian Vogt @joemwatson.bsky.social @jan-peters.bsky.social

02.10.2025 15:48 πŸ‘ 7 πŸ” 2 πŸ’¬ 1 πŸ“Œ 1

Thanks to my co-authors Florian Vogt, @joemwatson.bsky.social @jan-peters.bsky.social

@hessianai.bsky.social @ias-tudarmstadt.bsky.social @dfki.bsky.social @cs-tudarmstadt.bsky.social
#RL #ML #AI

23.05.2025 12:50 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
RLDM | The Multi-disciplinary Conference on Reinforcement Learning and Decision Making

If you're working on RL stability, plasticity, or sample efficiency, this might be relevant for you.

We'd love to hear your thoughts and feedback!

Come talk to us at RLDM in June in Dublin (rldm.org)

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
https://arxiv.org/abs/2502.07523v2

πŸ“š TL;DR: We combine BN + WN in CrossQ for stable high-UTD training and SOTA performance on challenging RL benchmarks. No need for network resets, no critic ensembles, no other tricks... Simple regularization, big gains.

Paper: t.co/Z6QrMxZaPY

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

βš–οΈ Simpler β‰  Weaker: Compared to SOTA baselines like BRO our method:
βœ… Needs 90% fewer parameters (~600K vs. 5M)
βœ… Avoids parameter resets
βœ… Scales stably with compute.

We also compare strongly to the concurrent SIMBA algorithm.

No tricksβ€”just principled normalization. ✨

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ”¬ The Result: CrossQ + WN scales reliably with increasing UTDβ€”no more resets, no critic ensembles, no other tricks.
We match or outperform SOTA on 25 continuous control tasks from DeepMind Control Suite & MyoSuite, including dog πŸ• and humanoidπŸ§β€β™‚οΈtasks across UTDS.

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

➑️ With growing weight norm, the effective learning rate decreases, and learning slows down/stops.

πŸ’‘Solution: After each gradient update, we rescale parameters to the unit sphere, preserving plasticity and keeping the effective learning rate stable.

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🧠Key Idea: BN improves sample efficiency, but fails to reliably scale with complex tasks & high UTDs due to growing weight norms.
However, BN regularized networks are scale invariant w.r.t. their weights; yet, the gradient scales inversely proportional (Van Laarhoven 2017)

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ” Background: Off-policy RL methods like CrossQ (Bhatt* & Palenicek* et al. 2024) are sample-efficient but struggle to scale to high update-to-data (UTD) ratios.

We identify why scaling CrossQ failsβ€”and fix it with a surprisingly effective tweak: Weight Normalization (WN). πŸ‹οΈ

23.05.2025 12:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
https://arxiv.org/abs/2502.07523v2

πŸš€ New preprint "Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization"πŸ€–

We propose CrossQ+WN, a simple yet powerful off-policy RL for more sample-efficiency and scalability to higher update-to-data ratios. 🧡 t.co/Z6QrMxZaPY

#RL @ias-tudarmstadt.bsky.social

23.05.2025 12:50 πŸ‘ 7 πŸ” 1 πŸ’¬ 1 πŸ“Œ 2

Check out our latest work, where we train an omnidirectional locomotion policy directly on a real quadruped robot in just a few minutes based on our CrossQ RL algorithm πŸš€
Shoutout @nicobohlinger.bsky.social, Jonathan Kinzel.

@ias-tudarmstadt.bsky.social @hessianai.bsky.social

19.03.2025 10:34 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0