Cansu Sancaktar's Avatar

Cansu Sancaktar

@cansusancaktar

PhD Student @ Max Planck Institute for Intelligent Systems & University of Tรผbingen | Working on intrinsically motivated open-ended reinforcement learning ๐Ÿค–

80
Followers
44
Following
8
Posts
20.11.2024
Joined
Posts Following

Latest posts by Cansu Sancaktar @cansusancaktar

Sergey Levine was just presenting in the Exploration in AI @ #ICML2025 and promoted that exploration needs to be grounded, and that VLMs are a good source ;-) Check our paper below
๐Ÿ‘‡

19.07.2025 17:47 ๐Ÿ‘ 10 ๐Ÿ” 4 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Want to find out more about SENSEI?

๐Ÿ—ฃ๏ธICML Poster West Exhibition Hall, 16 Jul, 11a.m. PDT, No. W-707
๐Ÿ“œarxiv.org/abs/2503.01584
๐ŸŒsites.google.com/view/sensei-paper

Work done with @cgumbsch.bsky.social (co-first), @zadaianchuk.bsky.social, @pavelkolevbg.bsky.social and @gmartius.bsky.social

8/8

14.07.2025 08:02 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1
Post image

SENSEI can also guide exploration in combination with task rewards. When playing Pokรฉmon Red from pixels, we achieve superior performance to Dreamer (pure task rewards) and Plan2Explore. Only SENSEI manages to obtain the first Gym Badge within 2M steps of exploration ๐Ÿฅ‡
7/8

14.07.2025 08:02 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

The agent learns a world model during exploration that can later be re-used to solve downstream tasks. We demonstrate more sample-efficient policy learning with SENSEI compared to exploration via Plan2Explore.

6/8

14.07.2025 08:02 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

Through the combination of semantic exploration with epistemic uncertainty, the agent unlocks a variety of interesting behaviors during task-free exploration. For example, in Robodesk the agent focuses on interacting with all available objects ๐Ÿฆพ
5/8

14.07.2025 08:02 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

To continuously push the frontier of experience, we combine semantic rewards with epistemic uncertainty deploying an adaptive go-explore strategy. The agent first tries to reach interesting situations (๐Ÿ” semantic reward) and then tries new things from there (๐Ÿ” uncertainty)
4/8

14.07.2025 08:02 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

How do we get a signal for meaningful behavior?๐Ÿค”
Our approach is to use human priors found in foundation models. We extend MOTIF to VLMs: A VLM compares observation pairs, collected through self-supervised exploration. This ranking is distilled into a reward function.
3/8

14.07.2025 08:02 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Intrinsically motivated exploration faces a chicken-or-egg problem: how do you know whatโ€™s worth exploring before trying it out and experiencing the consequences?
Children solve this by observing and imitating adults. We bring such semantic exploration to artificial agents.
2/8

14.07.2025 08:02 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

โœจIntroducing SENSEIโœจ We bring semantically meaningful exploration to model-based RL using VLMs.

With intrinsic rewards for novel yet useful behaviors, SENSEI showcases strong exploration in MiniHack, Pokรฉmon Red & Robodesk.

Accepted at ICML 2025๐ŸŽ‰

Joint work with @cgumbsch.bsky.social
๐Ÿงต

14.07.2025 08:02 ๐Ÿ‘ 21 ๐Ÿ” 5 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 4