Thรฉo Vincent's Avatar

Thรฉo Vincent

@theo-vincent

PhD student at @dfki | @ias-tudarmstadt.bsky.social, working on RL ๐Ÿค– Previously master student at MVA @ENS_ParisSaclay | ENPC ๐ŸŽ“

93
Followers
252
Following
22
Posts
01.12.2024
Joined
Posts Following

Latest posts by Thรฉo Vincent @theo-vincent

Quick reminder for everyone grinding on their RLC 2026 papers, only ~3 weeks to go!

The submission site opens in just a few days (Feb 17).

Deadlines:

โณ March 1 (AoE): Abstract Submission
โณ March 5 (AoE): Full Paper Submission

Good luck with the final changes!

12.02.2026 17:45 ๐Ÿ‘ 7 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 2

We're thrilled to share that the Call for Workshops for this year's @rl-conference.bsky.social is now live!

As Workshop co-chair (alongside the wonderful Raksha Kumaraswamy and @claireve.bsky.social) we are looking forward to seeing the proposals for workshops that we receive.

LINK IN NEXT POST

13.02.2026 21:50 ๐Ÿ‘ 11 ๐Ÿ” 5 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2
Video thumbnail

๐Ÿงต Accepted at @iclr-conf.bsky.social!

Target networks stabilize bootstrapping in RL ๐Ÿ›ก๏ธ
But induce slow-moving targets ๐Ÿข

Online networks adapt fast โšก
But can diverge with function approximation ๐Ÿ’ฅ

๐— ๐—œ๐—ก๐—ง๐—ข ๐ŸŒฟ uses the online network ๐—ผ๐—ป๐—น๐˜† ๐—ถ๐—ณ ๐—ถ๐˜ ๐—ฐ๐—ฎ๐—ป โ€” yielding faster ๐˜ข๐˜ฏ๐˜ฅ more stable RL.

Hereโ€™s how ๐Ÿ‘‡

11.02.2026 17:02 ๐Ÿ‘ 10 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

The Reinforcement Learning workshop at U Mannheim was a lot of fun and highly recommended if you are looking for an engaging exchange of ideas, thanks to the organizers: Leif Dรถring, @theo-vincent.bsky.social, @claireve.bsky.social, and Simon WeiรŸmann! www.wim.uni-mannheim.de/doering/conf...

08.02.2026 19:13 ๐Ÿ‘ 12 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

9/9
Many thanks to my co-authors: @yogesh1q2w.bsky.social, Tim Faust, Abdullah Akgรผl, Yaniv Oren, Melih Kandemir, @jan-peters.bsky.social, and Carlo D'Eramo

and to the funding agencies: @ias-tudarmstadt.bsky.social @tuda.bsky.social, @dfki.bsky.social, and @hessianai.bsky.social

05.02.2026 16:37 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

8/9
Does it work onย otherย settings?
YES, we also report results:
- with the IMPALA architecture๐Ÿฆ“
- on offline experimentsโœˆ๏ธ
- on continuous control experiments with the Simba architecture (only on the poster)๐Ÿค–

๐Ÿ“„๐Ÿ‘‰arxiv.org/pdf/2506.04398

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

7/9
By enforcing the network to learn multiple Bellman backups in parallel, iS-DQN K>1 constructs richer features๐Ÿ’ช

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

6/9
By adding additional heads to learn the following Bellman backupsย (iS-DQN K>1), iS-QN improves performance while not significantly increasing the memory footprint๐Ÿš€

Note: we added a layer normalization to further increase stability.

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

5/9
Interestingly, the idea of sharing the last features (iS-DQN K=1) already reduces the performance gap betweenย target-free DQN (TF-DQN) and target-based DQN (TB-DQN) on 15 Atari games by a large margin.

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

4/9
Then, we can utilize the target-based literature to enhance training stability.

We enrich the classical TD loss with iterated Q-learning to increase the feedback on the shared layers by learning consecutive Bellman backups.

This leads to iterated Shared Q-Network (iS-QN)

05.02.2026 16:37 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

3/9
Our main idea is to use the last linear layer of the online network as a target network and share the rest of the features with the online network.

This drastically reduces the memory footprint because only the last linear layer of the online network is stored as a copy.

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

2/9
Many recentย works have shown that removing the target network leads to a performance decrease๐Ÿ“‰

Even methods that have been initially introduced without a target network benefit from their reintegration๐Ÿ“ˆ

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

1/9
With function approximation, bootstrapping without using a target network often leads to training instabilities.

However, using a target network slows down reward propagation and doubles the memory footprint dedicated to Q-networks.

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

TL;DR: Instead of using a full copy of the online network, we use a copy of the last linear layer of the online networkย as the target network, sharing the other features with the online network๐Ÿ’ก

05.02.2026 16:37 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Should we use a target network in deep value-based RL?๐Ÿค”

The answer has always been YES or NO, as there are pros and cons.

@iclr-conf.bsky.social, I will present iS-QN, a method that lies in between this binary view, collecting the pros while reducing the cons๐Ÿš€

05.02.2026 16:37 ๐Ÿ‘ 21 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

๐ŸฅณOur paper "Floating-Base Deep Lagrangian Networks (FeLaN)" has been accepted to #ICRA2026.

FeLaN: a grey-box approach for physically consistent SysID of floating-base robots (humanoids, quadrupeds).

๐Ÿ“„ arxiv.org/abs/2510.17270
๐Ÿ’ป Soon!
๐ŸŒ schulze18.github.io/felan_website/

03.02.2026 16:29 ๐Ÿ‘ 10 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Ahmed Hendawy, Henrik Metternich, Th\'eo Vincent, Mahdi Kallel, Jan Peters, Carlo D'Eramo: Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning https://arxiv.org/abs/2510.02590 https://arxiv.org/pdf/2510.02590 https://arxiv.org/html/2510.02590

06.10.2025 06:32 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐ŸŽค Announcing the 3rd workshop on Reinforcement Learning in Mannheim ๐ŸŽค

We have an amazing lineup of speakers: @Mathieugeist, @gio_ramponi, Theresa Eimer, @SarahKeren_, @araffin2, @c_rothkopf, and @AdrienBolland

โฐ Friday 6th February
๐Ÿ“University of Mannheim

02.12.2025 11:45 ๐Ÿ‘ 22 ๐Ÿ” 10 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

New #J2C Certification:

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Thรฉo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D'Eramo

https://openreview.net/forum?id=Lt2H8Bd8jF

#reinforcement #iterative #iterations

27.10.2025 08:23 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Theฬo Vincent  - Optimizing the Learning Trajectory of Reinforcement Learning Agents
Theฬo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents YouTube video by Cohere

If you could not attend, here is a recorded version of my talk: youtube.com/watch?v=RCA2... ๐Ÿ“ฝ๏ธ

19.09.2025 16:08 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

As usual, @ewrl18.bsky.social was a wonderful experience.

I had the pleasure of presenting my research as a Contributed Talk ๐ŸŽ‰

Special thanks to the organizers for making it happen!

19.09.2025 16:08 ๐Ÿ‘ 8 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Looking forward to @rl-conference.bsky.social !

I will be presenting 4 posters. Feel free to come and exchange with me during the conference, at the Finding the Frame workshop, or at the Inductive Biases workshop๐Ÿ™‚

04.08.2025 14:58 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Theฬo Vincent  - Optimizing the Learning Trajectory of Reinforcement Learning Agents
Theฬo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents YouTube video by Cohere

Had an amazing time presenting my research @cohereforai.bsky.social yesterday ๐ŸŽค

In case you could not attend, feel free to check it out ๐Ÿ‘‰

youtu.be/RCA22JWiiY8?...

19.07.2025 07:41 ๐Ÿ‘ 7 ๐Ÿ” 3 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Cohere Labs - Thรฉo Vincent, Ph.D. student Cohere Labs -Thรฉo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents

Many thanks to Rahul Narava for the invitation!

More information here: cohere.com/events/Coher...

11.07.2025 16:20 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐ŸŽค Very excited to give a talk @cohereforai.bsky.social next week Friday ๐ŸŽค

I will be presenting the research I have been working on for the last 2 years with Carlo D'Eramo, @jan-peters.bsky.social, and many more collaborators!

11.07.2025 16:17 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

IAS is at RLDM 2025! We have many exiting works to share (see ๐Ÿ‘‡), so come to our posters and talk to us!

12.06.2025 14:55 ๐Ÿ‘ 4 ๐Ÿ” 3 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 0

Sparse network -> sparse poster

I will be presenting Eau De Q-Network today @rldmdublin2025.bsky.social Feel free to come and exchange at Poster #28 ๐ŸŽค

bsky.app/profile/theo...

12.06.2025 13:39 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Excited to present our latest work at RLDM 2025! If youโ€™re curious about tactile sensing, active perception, or RL in robotics, stop by my poster. Hereโ€™s what weโ€™ve been up to:
๐Ÿงต
#Robotics #TactileSensing #ReinforcementLearning #Transformers #ActivePerception @ias-tudarmstadt.bsky.social

12.06.2025 12:33 ๐Ÿ‘ 9 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

It was amazing to work on this project with Tim Faust, Yogesh Tripathi, @jan-peters.bsky.social, and Carlo D'Eramo!

Thanks to the funding agencies @ias-tudarmstadt.bsky.social, @cs-tudarmstadt.bsky.social, @dfki.bsky.social, @hessianai.bsky.social, and @uni-wuerzburg.de๐Ÿฆ

09.06.2025 14:56 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Very excited to present ๐ŸŽ‰Eau De Q-Network๐ŸŽ‰ on Thursday @rldmdublin2025.bsky.social Poster #28

๐Ÿ”Eau De Q-Network gradually prunes the network weights at the agent's learning pace, ultimately reaching a final sparsity level that is discovered by the algorithm!๐Ÿ”Ž

๐Ÿ‘‰๐Ÿ“ฐ arxiv.org/pdf/2503.01437

09.06.2025 14:54 ๐Ÿ‘ 11 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2