Lucas Alegre (@lnalegre)

It was very fun to present our last paper "Constructing an Optimal Behavior Basis for the Option Keyboard" at NeurIPS this week!

Paper: openreview.net/pdf?id=D4gOo...

#neurips2025

04.12.2025 20:17 👍 3 🔁 0 💬 0 📌 0

Sure, only ~2 weeks to review 5 papers for ICLR. I am sure that all reviewers will have sufficient time to write careful and thoughtful reviews in the following weeks, since they have nothing else to do.

It is insane to expect a fair reviewing system in these terms.

14.10.2025 14:33 👍 2 🔁 0 💬 0 📌 0

It is really cool to see our work on multi-step GPI being cited in this amazing survey! :)

proceedings.neurips.cc/paper_files/...

03.09.2025 12:44 👍 6 🔁 0 💬 0 📌 0

On average I have a good score, but it has happened to me before to have 3/4 reviewers accepting the paper, and 1 negative reviewer convincing the AC to reject.

01.08.2025 11:38 👍 0 🔁 0 💬 0 📌 0

And now I got the classic rebuttal response:

"I have no concerns with the paper, all the theory is great, but since you did not run experiments in expensive domains with image-based environments, I will not increase my score".

The goal of experiments is to validate the claims! Not to beat Atari!

01.08.2025 02:08 👍 2 🔁 0 💬 1 📌 0

I got the classic NeurIPS reviews "why did you not compare with [completely unrelated method whose comparison would not help support any of the paper's claim]?"

Questioning myself whether I should spend my weekend running this useless experiment or if I should argue with the reviewer.

24.07.2025 21:12 👍 3 🔁 0 💬 1 📌 0

Finally, reporting only IQM may compromise scientific transparency and fairness, as it can mask poor or unstable performance. Agarwal et al. (2021), who introduced IQM in this context, recommend using it in conjunction with other statistics rather than as a standalone measure.

20.06.2025 19:31 👍 1 🔁 0 💬 0 📌 0

Yes, Interquartile Mean (IQM) is a robust statistic that reduces the influence of outliers. But it does not by itself provide a clear and fair analysis of performance. In particular, IQM does not capture the full distribution of returns and may hide important information about variability and risk.

20.06.2025 19:31 👍 1 🔁 0 💬 1 📌 0

Deep Reinforcement Learning at the Edge of the Statistical Precipice Our findings call for a change in how we report performance on benchmarks when using only a few runs, for which we present more reliable protocols accompanied with an open-source library.

While I really like the paper "Deep Reinforcement Learning at the Edge of the Statistical Precipice" (openreview.net/forum?id=uqv...), I have seen papers evaluating performance using only the IQM metric and claiming that it is a fairer metric than the mean based on this paper, which is simply wrong.

20.06.2025 19:31 👍 6 🔁 1 💬 1 📌 1

This work was done during my time as an intern at Disney Research Zürich. It was amazing and really fun to develop this idea with the Robotics Team!

02.06.2025 17:13 👍 2 🔁 0 💬 0 📌 0

Check out AMOR now on arXiv:

Paper: arxiv.org/abs/2505.23708
Full Video: youtube.com/watch?v=gQid...

#SIGGRAPH2025 #RL #robotics

02.06.2025 17:09 👍 2 🔁 0 💬 1 📌 0

A base policy with uniform weights might fail on challenging motions, but with a few weight tweaks, it nails them. Like this double spin. 🌀😵‍💫

Curious how tuning weights mid-motion can help improve the sim-to-real gap and unlock dynamic, expressive behaviors?

02.06.2025 17:09 👍 1 🔁 0 💬 1 📌 0

AMOR trains a single policy conditioned on reward weights and motion context, letting you fine-tune the reward after training.
Want smoother motions? Better accuracy? Just adjust the weights — no retraining needed!

02.06.2025 17:09 👍 1 🔁 0 💬 1 📌 0

We are excited to share our #SIGGRAPH2025 paper,

“AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning”!
Lucas Alegre*, Agon Serifi*, Ruben Grandia, David Müller, Espen Knoop, Moritz Baecher

02.06.2025 17:09 👍 3 🔁 0 💬 1 📌 0

AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning YouTube video by DisneyResearchHub

Annoyed by having to retrain your entire policy just because your reward weights did not quite work on the real robot? 🤖

www.youtube.com/watch?v=gQid...

02.06.2025 17:09 👍 6 🔁 1 💬 1 📌 0

Thank you, Peter! :)

30.05.2025 13:00 👍 1 🔁 0 💬 0 📌 0

I'm really glad to have been selected as one of the ICML 2025 Top Reviewers!

Too bad I won't be able to go since my last submission was not accepted, even with scores Accept, Accept, Weak Accept, and Weak Reject 🫠

29.05.2025 15:44 👍 6 🔁 0 💬 0 📌 0

Last week, I was at @khipu-ai.bsky.social in Santiago, Chile. It was really amazing to see so many great speakers and researchers from Latin America together!

17.03.2025 18:33 👍 4 🔁 1 💬 0 📌 0

Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning...

RL is so back!

(well, for some of us, it never really left)

awards.acm.org/about/2024-t...

05.03.2025 10:41 👍 72 🔁 12 💬 1 📌 1

A practical guide to multi-objective reinforcement learning and planning - Autonomous Agents and Multi-Agent Systems Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learnin...

Thank you!
link.springer.com/article/10.1...
This paper is a great start point!

16.02.2025 19:41 👍 1 🔁 0 💬 0 📌 0

Thank you! 😊

16.02.2025 01:40 👍 1 🔁 0 💬 0 📌 0

Finally, I would like to thank my advisors, Prof. Ana Bazzan and Prof. Bruno C. da Silva; Prof. Ann Nowé who received me at VUB for my PhD stay; and Disney Research Zürich, where I interned.

I am very grateful to everyone with that I had the chance to collaborate in all such amazing projects! 💙

16.02.2025 00:51 👍 1 🔁 0 💬 0 📌 0

I believe all these contributions open room for many interesting ideas for multi-policy RL methods. Especially in transfer learning (SFs&GPI) and multi-objective RL settings! 🚀

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

GitHub - Farama-Foundation/MO-Gymnasium: Multi-objective Gymnasium environments for reinforcement learning Multi-objective Gymnasium environments for reinforcement learning - Farama-Foundation/MO-Gymnasium

* MO-Gymnasium (github.com/Farama-Found...) is a library of MORL environments; and

* MORL Baselines (github.com/LucasAlegre/...) is a library of MORL algorithms.

Both have become standards in MORL research and have over 100k downloads in the past year!

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

Besides the theoretical and algorithmic contributions, we also introduced an open-source toolkit for MORL research!

NeurIPS D&B 2023 Paper - openreview.net/pdf?id=jfwRL...

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

Next, we further explored how to leverage approximate models of the environment to improve zero-shot policy transfer. Our method, ℎ-GPI, interpolates between model-free GPI and fully model-based planning as a function of the planning horizon ℎ.

NeurIPS 2023 Paper - openreview.net/pdf?id=KFj0Q...

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization | Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems You will be notified whenever a record that you have chosen has been cited.

Next, we further explored these ideas and introduced two novel MORL algorithms that exploit GPI to increase sample efficiency in MORL: GPI-LS and GPI-PD.

AAMAS'23 paper: tinyurl.com/aamas23

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly,...

By exploiting these connections, we introduced SFOLS, a method that is capable of constructing a set of policies and combining them via GPI with the guarantee of obtaining the optimal policy for any novel linearly-expressible tasks!

ICML'22 paper: proceedings.mlr.press/v162/alegre2...

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly,...

It all started when we discovered and introduced connections between Successor Features and multi-objective RL (MORL):

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

I thought it would be a good idea to make a thread highlighting the main contributions of my Ph.D! 🧵

16.02.2025 00:51 👍 1 🔁 0 💬 1 📌 0

Lucas Alegre

Latest posts by Lucas Alegre @lnalegre