Can Generative Video Models Help Pose Estimation?
Yes! We find that off-the-shelf generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little t...
Project page: inter-pose.github.io
Paper: arxiv.org/abs/2412.16155
Great thanks to the amazing team Jason Y. Zhang (@jasonyzhang.bsky.social), Philipp Henzler, Zhengqi Li (@zhengqili.bsky.social), Noah Snavely (@snavely.bsky.social), Ricardo Martin-Brualla.
23.12.2024 17:44
π 0
π 0
π¬ 0
π 0
This also applies to MASt3R. While MASt3R excels with overlapping pairs via feature matching, it struggles with non-overlapping ones due to unreliable correspondences. InterPose maintains robustness, outperforming MASt3R on outward-facing and matching it on center-facing datasets.
23.12.2024 17:44
π 1
π 0
π¬ 1
π 0
We show that InterPose generalizes across 3 SOTA video models (DynamiCrafter, Runway Gen-3, Luma Dream Machine) and consistently outperforms DUSt3R on 4 diverse datasets (indoor, outdoor, object) using our new benchmark, which selects challenging pairs with little to no overlap.
23.12.2024 17:44
π 1
π 0
π¬ 1
π 0
β οΈ Challenge: Generated videos may contain visual artifacts or implausible motion.
π Solution: We generate multiple videos and use a self-consistency metric to select the most visually consistent sample.
23.12.2024 17:44
π 1
π 0
π¬ 1
π 0
π‘ Motivation β Powerful Visual Priors: Video models are pre-trained on vast web-scale video data, enabling them to learn significantly more powerful priors of the visual world compared to 3D models like DUSt3R requiring 3D datasets.
23.12.2024 17:44
π 1
π 0
π¬ 1
π 0
π€Can Generative Video Models Help Pose Estimation?
β
Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
π inter-pose.github.io
23.12.2024 17:44
π 15
π 4
π¬ 1
π 1
Introducing Doppelgangers++! π An enhanced pairwise image classifier that tackles visual aliasing (doppelgangers) to improve 3D reconstruction accuracy across diverse, real-world scenes. πβ¨
πProject page: bit.ly/3VAPMJc. Code is also available.
11.12.2024 02:40
π 20
π 4
π¬ 1
π 0