It was very fun to present our last paper "Constructing an Optimal Behavior Basis for the Option Keyboard" at NeurIPS this week!
Paper: openreview.net/pdf?id=D4gOo...
#neurips2025
It was very fun to present our last paper "Constructing an Optimal Behavior Basis for the Option Keyboard" at NeurIPS this week!
Paper: openreview.net/pdf?id=D4gOo...
#neurips2025
Sure, only ~2 weeks to review 5 papers for ICLR. I am sure that all reviewers will have sufficient time to write careful and thoughtful reviews in the following weeks, since they have nothing else to do.
It is insane to expect a fair reviewing system in these terms.
It is really cool to see our work on multi-step GPI being cited in this amazing survey! :)
proceedings.neurips.cc/paper_files/...
On average I have a good score, but it has happened to me before to have 3/4 reviewers accepting the paper, and 1 negative reviewer convincing the AC to reject.
And now I got the classic rebuttal response:
"I have no concerns with the paper, all the theory is great, but since you did not run experiments in expensive domains with image-based environments, I will not increase my score".
The goal of experiments is to validate the claims! Not to beat Atari!
I got the classic NeurIPS reviews "why did you not compare with [completely unrelated method whose comparison would not help support any of the paper's claim]?"
Questioning myself whether I should spend my weekend running this useless experiment or if I should argue with the reviewer.
Finally, reporting only IQM may compromise scientific transparency and fairness, as it can mask poor or unstable performance. Agarwal et al. (2021), who introduced IQM in this context, recommend using it in conjunction with other statistics rather than as a standalone measure.
Yes, Interquartile Mean (IQM) is a robust statistic that reduces the influence of outliers. But it does not by itself provide a clear and fair analysis of performance. In particular, IQM does not capture the full distribution of returns and may hide important information about variability and risk.
While I really like the paper "Deep Reinforcement Learning at the Edge of the Statistical Precipice" (openreview.net/forum?id=uqv...), I have seen papers evaluating performance using only the IQM metric and claiming that it is a fairer metric than the mean based on this paper, which is simply wrong.
This work was done during my time as an intern at Disney Research ZΓΌrich. It was amazing and really fun to develop this idea with the Robotics Team!
Check out AMOR now on arXiv:
Paper: arxiv.org/abs/2505.23708
Full Video: youtube.com/watch?v=gQid...
#SIGGRAPH2025 #RL #robotics
A base policy with uniform weights might fail on challenging motions, but with a few weight tweaks, it nails them. Like this double spin. ππ΅βπ«
Curious how tuning weights mid-motion can help improve the sim-to-real gap and unlock dynamic, expressive behaviors?
AMOR trains a single policy conditioned on reward weights and motion context, letting you fine-tune the reward after training.
Want smoother motions? Better accuracy? Just adjust the weights β no retraining needed!
We are excited to share our #SIGGRAPH2025 paper,
βAMOR: Adaptive Character Control through Multi-Objective Reinforcement Learningβ!
Lucas Alegre*, Agon Serifi*, Ruben Grandia, David MΓΌller, Espen Knoop, Moritz Baecher
Annoyed by having to retrain your entire policy just because your reward weights did not quite work on the real robot? π€
www.youtube.com/watch?v=gQid...
Thank you, Peter! :)
I'm really glad to have been selected as one of the ICML 2025 Top Reviewers!
Too bad I won't be able to go since my last submission was not accepted, even with scores Accept, Accept, Weak Accept, and Weak Reject π«
Last week, I was at @khipu-ai.bsky.social in Santiago, Chile. It was really amazing to see so many great speakers and researchers from Latin America together!
RL is so back!
(well, for some of us, it never really left)
awards.acm.org/about/2024-t...
Thank you!
link.springer.com/article/10.1...
This paper is a great start point!
Thank you! π
Finally, I would like to thank my advisors, Prof. Ana Bazzan and Prof. Bruno C. da Silva; Prof. Ann NowΓ© who received me at VUB for my PhD stay; and Disney Research ZΓΌrich, where I interned.
I am very grateful to everyone with that I had the chance to collaborate in all such amazing projects! π
I believe all these contributions open room for many interesting ideas for multi-policy RL methods. Especially in transfer learning (SFs&GPI) and multi-objective RL settings! π
* MO-Gymnasium (github.com/Farama-Found...) is a library of MORL environments; and
* MORL Baselines (github.com/LucasAlegre/...) is a library of MORL algorithms.
Both have become standards in MORL research and have over 100k downloads in the past year!
Besides the theoretical and algorithmic contributions, we also introduced an open-source toolkit for MORL research!
NeurIPS D&B 2023 Paper - openreview.net/pdf?id=jfwRL...
Next, we further explored how to leverage approximate models of the environment to improve zero-shot policy transfer. Our method, β-GPI, interpolates between model-free GPI and fully model-based planning as a function of the planning horizon β.
NeurIPS 2023 Paper - openreview.net/pdf?id=KFj0Q...
Next, we further explored these ideas and introduced two novel MORL algorithms that exploit GPI to increase sample efficiency in MORL: GPI-LS and GPI-PD.
AAMAS'23 paper: tinyurl.com/aamas23
By exploiting these connections, we introduced SFOLS, a method that is capable of constructing a set of policies andΒ combining them via GPI with the guarantee of obtaining theΒ optimal policyΒ for any novel linearly-expressible tasks!
ICML'22 paper: proceedings.mlr.press/v162/alegre2...
It all started when we discovered and introduced connections between Successor Features and multi-objective RL (MORL):
I thought it would be a good idea to make a thread highlighting the main contributions of my Ph.D! π§΅