Sai Kumar Dwivedi (@saidwivedi.in)

InteractVLM (#CVPR2025) is a great collaboration MPI-IS, UvA and Inria.
Authors: @saidwivedi.in, @anticdimi.bsky.social, S. Tripathi, O. Taheri, C. Schmid, @michael-j-black.bsky.social and @dimtzionas.bsky.social.
Code & models available at: interactvlm.is.tue.mpg.de (10/10)

15.06.2025 12:23 👍 3 🔁 0 💬 0 📌 0

InteractVLM is the first method that infers 3D contacts on both humans and objects from in-the-wild images, and exploits these for 3D reconstruction via an optimization pipeline. In contrast, existing methods like PHOSA rely on handcrafted or heuristic-based contacts. (9/10)

15.06.2025 12:23 👍 2 🔁 0 💬 1 📌 0

With just 5% of DAMON’s 3D body contact annotations, InteractVLM surpasses the fully-supervised DECO baseline trained on 100% of 3D annotations. This is promising for minimizing reliance on costly 3D data by using foundational models. (8/10)

15.06.2025 12:23 👍 1 🔁 0 💬 1 📌 0

InteractVLM also shows strong outperformance on object affordance prediction on the PIAD dataset. Here affordance is defined as contact probabilities on the object. (7/10)

15.06.2025 12:23 👍 2 🔁 0 💬 1 📌 0

InteractVLM significantly outperforms prior work, both qualitatively and quantitatively, on in-the-wild 3D human (binary & semantic) contact prediction on the DAMON dataset. (6/10)

15.06.2025 12:23 👍 1 🔁 0 💬 1 📌 0

To bridge this 2D-to-3D gap, we propose "Render-Localize-Lift":
- Render: 3D human/object meshes into multiview 2D images.
- Localize: A Multiview Localization (MV-Loc) model, guided by VLM tokens, predicts 2D contact masks.
- Lift: 2D contact masks to 3D.
(5/10)

15.06.2025 12:23 👍 1 🔁 1 💬 1 📌 0

How can we infer 3D contact with limited 3D data? InteractVLM exploits foundational models—a VLM & localization model fine tuned to reason about contact. Given an image & prompt, the VLM outputs tokens for localization. But these models work in 2D, while contact is 3D. (4/10)

15.06.2025 12:23 👍 1 🔁 1 💬 1 📌 0

Furthermore, simple binary contact (touching “any” object) misses the rich semantics of real multi-object interactions. Thus, we introduce a novel task - "Semantic Human Contact" estimation: predicting contact points on a human related to a specified object. (3/10)

15.06.2025 12:23 👍 2 🔁 0 💬 1 📌 0

Precisely inferring where humans contact objects from an image is hard due to occlusion & depth ambiguity. Current datasets of images with 3D contact are small as they’re costly & tedious to create (mocap/manual labeling), limiting performance of contact predictors. (2/10)

15.06.2025 12:23 👍 1 🔁 0 💬 1 📌 0

Why does 3D human-object reconstruction fail in the wild or get limited to a few object classes? A key missing piece is accurate 3D contact. InteractVLM (#CVPR2025) uses foundational models to infer contact on humans & objects, improving reconstruction from a single image. (1/10)

15.06.2025 12:23 👍 5 🔁 2 💬 1 📌 0

✨ Happy to be recognised again as an Outstanding Reviewer for #CVPR2025!

12.05.2025 06:46 👍 2 🔁 0 💬 0 📌 0

Thanks to the workshop organizers: @yixinchen.bsky.social, Baoxiong Jia, @yaoyaofeng.bsky.social, @songyoupeng.bsky.social, Chuhang Zou, @saidwivedi.in, Yixin Zhu, Siyuan Huang! 🙌
And the challenge organizers: Xiongkun Linghu, Tai Wang, Jingli Lin, Xiaojian Ma

03.04.2025 08:29 👍 1 🔁 0 💬 0 📌 0

📢 Excited to announce the 5th Workshop on 3D Scene Understanding for Vision, Graphics & Robotics at #CVPR2025! We’ll dive into multimodal 3D scene understanding & reasoning with amazing speakers and challenges.
@cvprconference.bsky.social
More Details: scene-understanding.com.

03.04.2025 08:29 👍 3 🔁 2 💬 1 📌 0

I've been using GitHub's Lists feature for over a year, and it's seriously underrated! ⭐

It lets you assign labels to all your starred repos, making it super easy to find projects later based on specific fields or topics. No more endless scrolling!

Link to my list: github.com/saidwivedi?t...

12.03.2025 08:37 👍 3 🔁 0 💬 0 📌 0

Vacancy — PhD Positions, Project 'Spatiotemporal Reconstruction of Interacting People for Perceiving Systems' Do you want to help computers see, understand, and assist us, humans, in everyday life? Are you excited with 3D Machine Perception, 3D Human and Object Understanding, 3D Human Avatars, and Machine Le...

📢 I am #hiring 2x #PhD candidates to work on Human-centric #3D #ComputerVision at the University of #Amsterdam!

The positions are funded by an #ERC #StartingGrant.

For details and for submitting your application please see:
werkenbij.uva.nl/en/vacancies...

🆘 Deadline: Feb 16 🆘

26.01.2025 17:31 👍 15 🔁 6 💬 1 📌 2

Thanks for sharing :) @chrisoffner3d.bsky.social can you also please add me to the list? I work on 3D human avatar.

09.01.2025 00:56 👍 3 🔁 0 💬 0 📌 0

[M2L 2024] Transformers - Lucas Beyer YouTube video by Mediterranean Machine Learning (M2L) summer school

One of the best tutorials for understanding Transformers!

📽️ Watch here: www.youtube.com/watch?v=bMXq...

Big thanks to @giffmana.ai for this excellent content! 🙌

08.12.2024 09:58 👍 55 🔁 8 💬 0 📌 0

Would love to be in the list 😃

24.11.2024 09:18 👍 1 🔁 0 💬 1 📌 0

Writing a good scientific paper

For those who missed this post on the-network-that-is-not-to-be-named, I made public my "secrets" for writing a good CVPR paper (or any scientific paper). I've compiled these tips of many years. It's long but hopefully it helps people write better papers. perceiving-systems.blog/en/post/writ...

20.11.2024 10:18 👍 260 🔁 64 💬 4 📌 8

Sai Kumar Dwivedi

Latest posts by Sai Kumar Dwivedi @saidwivedi.in