Théo Vincent (@theo-vincent)

Quick reminder for everyone grinding on their RLC 2026 papers, only ~3 weeks to go!

The submission site opens in just a few days (Feb 17).

Deadlines:

⏳ March 1 (AoE): Abstract Submission
⏳ March 5 (AoE): Full Paper Submission

Good luck with the final changes!

12.02.2026 17:45 👍 7 🔁 2 💬 0 📌 2

We're thrilled to share that the Call for Workshops for this year's @rl-conference.bsky.social is now live!

As Workshop co-chair (alongside the wonderful Raksha Kumaraswamy and @claireve.bsky.social) we are looking forward to seeing the proposals for workshops that we receive.

LINK IN NEXT POST

13.02.2026 21:50 👍 11 🔁 5 💬 1 📌 2

🧵 Accepted at @iclr-conf.bsky.social!

Target networks stabilize bootstrapping in RL 🛡️
But induce slow-moving targets 🐢

Online networks adapt fast ⚡
But can diverge with function approximation 💥

𝗠𝗜𝗡𝗧𝗢 🌿 uses the online network 𝗼𝗻𝗹𝘆 𝗶𝗳 𝗶𝘁 𝗰𝗮𝗻 — yielding faster 𝘢𝘯𝘥 more stable RL.

Here’s how 👇

11.02.2026 17:02 👍 10 🔁 3 💬 1 📌 0

The Reinforcement Learning workshop at U Mannheim was a lot of fun and highly recommended if you are looking for an engaging exchange of ideas, thanks to the organizers: Leif Döring, @theo-vincent.bsky.social, @claireve.bsky.social, and Simon Weißmann! www.wim.uni-mannheim.de/doering/conf...

08.02.2026 19:13 👍 12 🔁 2 💬 0 📌 0

9/9
Many thanks to my co-authors: @yogesh1q2w.bsky.social, Tim Faust, Abdullah Akgül, Yaniv Oren, Melih Kandemir, @jan-peters.bsky.social, and Carlo D'Eramo

and to the funding agencies: @ias-tudarmstadt.bsky.social @tuda.bsky.social, @dfki.bsky.social, and @hessianai.bsky.social

05.02.2026 16:37 👍 5 🔁 0 💬 0 📌 0

8/9
Does it work on other settings?
YES, we also report results:
- with the IMPALA architecture🦓
- on offline experiments✈️
- on continuous control experiments with the Simba architecture (only on the poster)🤖

📄👉arxiv.org/pdf/2506.04398

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

7/9
By enforcing the network to learn multiple Bellman backups in parallel, iS-DQN K>1 constructs richer features💪

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

6/9
By adding additional heads to learn the following Bellman backups (iS-DQN K>1), iS-QN improves performance while not significantly increasing the memory footprint🚀

Note: we added a layer normalization to further increase stability.

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

5/9
Interestingly, the idea of sharing the last features (iS-DQN K=1) already reduces the performance gap between target-free DQN (TF-DQN) and target-based DQN (TB-DQN) on 15 Atari games by a large margin.

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

4/9
Then, we can utilize the target-based literature to enhance training stability.

We enrich the classical TD loss with iterated Q-learning to increase the feedback on the shared layers by learning consecutive Bellman backups.

This leads to iterated Shared Q-Network (iS-QN)

05.02.2026 16:37 👍 5 🔁 0 💬 1 📌 0

3/9
Our main idea is to use the last linear layer of the online network as a target network and share the rest of the features with the online network.

This drastically reduces the memory footprint because only the last linear layer of the online network is stored as a copy.

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

2/9
Many recent works have shown that removing the target network leads to a performance decrease📉

Even methods that have been initially introduced without a target network benefit from their reintegration📈

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

1/9
With function approximation, bootstrapping without using a target network often leads to training instabilities.

However, using a target network slows down reward propagation and doubles the memory footprint dedicated to Q-networks.

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

TL;DR: Instead of using a full copy of the online network, we use a copy of the last linear layer of the online network as the target network, sharing the other features with the online network💡

05.02.2026 16:37 👍 4 🔁 0 💬 1 📌 0

Should we use a target network in deep value-based RL?🤔

The answer has always been YES or NO, as there are pros and cons.

@iclr-conf.bsky.social, I will present iS-QN, a method that lies in between this binary view, collecting the pros while reducing the cons🚀

05.02.2026 16:37 👍 21 🔁 4 💬 1 📌 1

🥳Our paper "Floating-Base Deep Lagrangian Networks (FeLaN)" has been accepted to #ICRA2026.

FeLaN: a grey-box approach for physically consistent SysID of floating-base robots (humanoids, quadrupeds).

📄 arxiv.org/abs/2510.17270
💻 Soon!
🌐 schulze18.github.io/felan_website/

03.02.2026 16:29 👍 10 🔁 3 💬 1 📌 0

Ahmed Hendawy, Henrik Metternich, Th\'eo Vincent, Mahdi Kallel, Jan Peters, Carlo D'Eramo: Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning https://arxiv.org/abs/2510.02590 https://arxiv.org/pdf/2510.02590 https://arxiv.org/html/2510.02590

06.10.2025 06:32 👍 1 🔁 1 💬 0 📌 0

🎤 Announcing the 3rd workshop on Reinforcement Learning in Mannheim 🎤

We have an amazing lineup of speakers: @Mathieugeist, @gio_ramponi, Theresa Eimer, @SarahKeren_, @araffin2, @c_rothkopf, and @AdrienBolland

⏰ Friday 6th February
📍University of Mannheim

02.12.2025 11:45 👍 22 🔁 10 💬 1 📌 1

New #J2C Certification:

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Théo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D'Eramo

https://openreview.net/forum?id=Lt2H8Bd8jF

#reinforcement #iterative #iterations

27.10.2025 08:23 👍 2 🔁 1 💬 0 📌 0

Théo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents YouTube video by Cohere

If you could not attend, here is a recorded version of my talk: youtube.com/watch?v=RCA2... 📽️

19.09.2025 16:08 👍 4 🔁 0 💬 0 📌 0

As usual, @ewrl18.bsky.social was a wonderful experience.

I had the pleasure of presenting my research as a Contributed Talk 🎉

Special thanks to the organizers for making it happen!

19.09.2025 16:08 👍 8 🔁 2 💬 1 📌 0

Looking forward to @rl-conference.bsky.social !

I will be presenting 4 posters. Feel free to come and exchange with me during the conference, at the Finding the Frame workshop, or at the Inductive Biases workshop🙂

04.08.2025 14:58 👍 2 🔁 0 💬 0 📌 0

Théo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents YouTube video by Cohere

Had an amazing time presenting my research @cohereforai.bsky.social yesterday 🎤

In case you could not attend, feel free to check it out 👉

youtu.be/RCA22JWiiY8?...

19.07.2025 07:41 👍 7 🔁 3 💬 0 📌 0

Cohere Labs - Théo Vincent, Ph.D. student Cohere Labs -Théo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents

Many thanks to Rahul Narava for the invitation!

More information here: cohere.com/events/Coher...

11.07.2025 16:20 👍 0 🔁 0 💬 0 📌 0

🎤 Very excited to give a talk @cohereforai.bsky.social next week Friday 🎤

I will be presenting the research I have been working on for the last 2 years with Carlo D'Eramo, @jan-peters.bsky.social, and many more collaborators!

11.07.2025 16:17 👍 4 🔁 1 💬 1 📌 0

IAS is at RLDM 2025! We have many exiting works to share (see 👇), so come to our posters and talk to us!

12.06.2025 14:55 👍 4 🔁 3 💬 4 📌 0

Sparse network -> sparse poster

I will be presenting Eau De Q-Network today @rldmdublin2025.bsky.social Feel free to come and exchange at Poster #28 🎤

bsky.app/profile/theo...

12.06.2025 13:39 👍 1 🔁 0 💬 0 📌 0

Excited to present our latest work at RLDM 2025! If you’re curious about tactile sensing, active perception, or RL in robotics, stop by my poster. Here’s what we’ve been up to:
🧵
#Robotics #TactileSensing #ReinforcementLearning #Transformers #ActivePerception @ias-tudarmstadt.bsky.social

12.06.2025 12:33 👍 9 🔁 2 💬 1 📌 1

It was amazing to work on this project with Tim Faust, Yogesh Tripathi, @jan-peters.bsky.social, and Carlo D'Eramo!

Thanks to the funding agencies @ias-tudarmstadt.bsky.social, @cs-tudarmstadt.bsky.social, @dfki.bsky.social, @hessianai.bsky.social, and @uni-wuerzburg.de🏦

09.06.2025 14:56 👍 2 🔁 0 💬 0 📌 0

Very excited to present 🎉Eau De Q-Network🎉 on Thursday @rldmdublin2025.bsky.social Poster #28

🔍Eau De Q-Network gradually prunes the network weights at the agent's learning pace, ultimately reaching a final sparsity level that is discovered by the algorithm!🔎

👉📰 arxiv.org/pdf/2503.01437

09.06.2025 14:54 👍 11 🔁 2 💬 1 📌 2

Théo Vincent

Latest posts by Théo Vincent @theo-vincent