Siddhant Haldar's Avatar

Siddhant Haldar

@haldarsiddhant

Excited about generalizing AI | PhD student @NYU

26
Followers
26
Following
8
Posts
08.12.2024
Joined
Posts Following

Latest posts by Siddhant Haldar @haldarsiddhant

Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation

We have open-sourced the code. Try out Point Policy on your robots!

Project page: point-policy.github.io
Arxiv: arxiv.org/abs/2502.20391
Code:

28.02.2025 23:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Further, reasoning about key points instead of raw pixels allows Point Policy to generalize to novel object instances and exhibit robustness to heavy scene variations, all while requiring at most 30 demonstrations per task.

28.02.2025 23:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Despite having no access to robot demonstrations, Point Policy exhibits an 88% success rate across 8 real-world tasks, a 75% improvement over baselines.

28.02.2025 23:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Point Policy uses sparse key points to represent both human demonstrators and robots, bridging the morphology gap. The scene is encoded through semantically meaningful key points from minimal human annotations.

28.02.2025 23:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us?

Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.

28.02.2025 23:28 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

We just released AnySense, an iPhone app for effortless data acquisition and streaming for robotics. We leverage Apple’s development frameworks to record and stream:

1. RGBD + Pose data
2. Audio from the mic or custom contact microphones
3. Seamless Bluetooth integration for external sensors

26.02.2025 15:14 πŸ‘ 34 πŸ” 10 πŸ’¬ 2 πŸ“Œ 0
Video thumbnail

Can we extend the power of world models beyond just online model-based learning? Absolutely!

We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.

31.01.2025 19:24 πŸ‘ 20 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1

BAKU is fully open source and surprisingly effective. We found it easily adaptable for a host of visuotactile tasks in visuoskin.github.io

10.12.2024 18:23 πŸ‘ 8 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

I will be presenting BAKU at the #NeurIPS2024 poster session on Thursday, December 12, from 11 a.m. to 2 p.m. PST at East Exhibit Hall A-C #4206!

Do drop in to chat about efficient robot policy architectures as well as some of the more recent work using BAKU.

11.12.2024 15:42 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

P3-PO is a great example of how simple human priors can facilitate significantly better generalizability for robot policies.

10.12.2024 20:48 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies Developing generalizable robot policies that can robustly handle varied environmental conditions and object instances remains a fundamental challenge in robot learning. While considerable efforts have...

All our code and task rollouts have been made public at: point-priors.github.io
Arxiv: arxiv.org/abs/2412.06784

Do try it out on your robots!

11.12.2024 08:04 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Turns out that replacing images with keypoint-based representations can enable enhanced generalization across spatial positions and orientations and novel object instances! We just released P3-PO, a method for learning generalizable policies with minimal data. πŸš€

11.12.2024 08:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Modern policy architectures are unnecessarily complex. In our #NeurIPS2024 project called BAKU, we focus on what really matters for good policy learning.

BAKU is modular, language-conditioned, compatible with multiple sensor streams & action multi-modality, and importantly fully open-source!

09.12.2024 23:33 πŸ‘ 30 πŸ” 9 πŸ’¬ 1 πŸ“Œ 2