Xingyu Chen's Avatar

Xingyu Chen

@xingyu-chen

PhD Student at Westlake University, working on 3D & 4D Foundation Models. https://rover-xingyu.github.io/

76
Followers
317
Following
13
Posts
08.01.2025
Joined
Posts Following

Latest posts by Xingyu Chen @xingyu-chen

Video thumbnail

Hu, Cheng, Yu et al., "VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction"

Easi3r-style attention analysis and masking with mask refinement with VGGT. Also discards tokens related to dynamic points.

03.12.2025 20:00 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Personal programs for ICCV 2025 are now available at:
www.scholar-inbox.com/conference/i...

10.10.2025 06:19 ๐Ÿ‘ 24 ๐Ÿ” 6 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

Look, 4D foundation models know about humans โ€“ and we just read it out!

08.10.2025 11:19 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Glad to be recognized as an outstanding reviewer!

05.10.2025 15:25 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
TTT3R: 3D Reconstruction as Test-Time Training 3D Reconstruction as Test-Time Training

๐Ÿ”—Page: rover-xingyu.github.io/TTT3R
๐Ÿ“„Paper: arxiv.org/abs/2509.26645
๐Ÿ’ปCode: github.com/Inception3D/...

Big thanks to the amazing team!
@xingyu-chen.bsky.social @fanegg.bsky.social @xiuyuliang.bsky.social @andreasgeiger.bsky.social @apchen.bsky.social

01.10.2025 15:28 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

Instead of updating all states uniformly, we incorporate image attention as per-token learning rates.

High-confidence matches get larger updates, while low-quality updates are suppressed.

This soft gating greatly extends the length generalization beyond the training context.

01.10.2025 15:26 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

#VGGT: accurate within short clips, but slow and prone to Out-of-Memory (OOM)

#CUT3R: fast with constant memory usage, but forgets.

We revisit them from a Test-Time Training (TTT) perspective and propose #TTT3R to get all three: fast, accurate, and OOM-free.

01.10.2025 15:24 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Let's keep revisiting 3D reconstruction!

01.10.2025 07:20 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image Post image

Excited to introduce LoftUp!

A strong (than ever) and lightweight feature upsampler for vision encoders that can boost performance on dense prediction tasks by 20%โ€“100%!

Easy to plug into models like DINOv2, CLIP, SigLIP โ€” simple design, big gains. Try it out!

github.com/andrehuang/l...

22.04.2025 07:55 ๐Ÿ‘ 19 ๐Ÿ” 5 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

If you're a researcher and haven't tried it yet, please give it a try! It took me a while to adjust, but now it's my favorite tool. You can read, bookmark, organize papers, and get recommendations based on your interests!

15.04.2025 05:37 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image Post image Post image Post image

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

@xingyu-chen.bsky.social, @fanegg.bsky.social, @xiuyuliang.bsky.social, @andreasgeiger.bsky.social, @apchen.bsky.social

arxiv.org/abs/2503.24391

02.04.2025 11:45 ๐Ÿ‘ 6 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image Post image

๐—˜๐—ฎ๐˜€๐—ถ๐Ÿฏ๐—ฅ: ๐—˜๐˜€๐˜๐—ถ๐—บ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐——๐—ถ๐˜€๐—ฒ๐—ป๐˜๐—ฎ๐—ป๐—ด๐—น๐—ฒ๐—ฑ ๐— ๐—ผ๐˜๐—ถ๐—ผ๐—ป ๐—ณ๐—ฟ๐—ผ๐—บ ๐——๐—จ๐—ฆ๐˜๐Ÿฏ๐—ฅ ๐—ช๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ง๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด
Xingyu Chen, Yue Chen, Yuliang Xiu ... Anpei Chen
arxiv.org/abs/2503.24391
Trending on www.scholar-inbox.com

02.04.2025 10:11 ๐Ÿ‘ 1 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

DUSt3R was never trained to do dynamic segmentation with GT masks, right? It was just trained to regress point maps on 3D datasetsโ€”yet dynamic awareness emerged, making DUSt3R a zero-shot 4D estimator!๐Ÿ˜€

02.04.2025 07:59 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I was really surprised when I saw this. Dust3R has learned very well to segment objects without supervision. This knowledge can be extracted post-hoc, enabling accurate 4D reconstruction instantly.

01.04.2025 18:45 ๐Ÿ‘ 31 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿ”—Page: easi3r.github.io
๐Ÿ“„Paper: arxiv.org/abs/2503.24391
๐Ÿ’ปCode: github.com/Inception3D/...

Big thanks to the amazing team!
@xingyu-chen.bsky.social, @fanegg.bsky.social, @xiuyuliang.bsky.social, @andreasgeiger.bsky.social, @apchen.bsky.social

01.04.2025 15:27 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

With our estimated segmentation masks, we perform a second inference pass by re-weighting the attention, enabling robust 4D reconstruction and even outperforming SOTA methods trained on 4D datasets, with almost no extra cost compared to vanilla DUSt3R.

01.04.2025 15:25 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

We propose an attention-guided strategy to decompose dynamic objects from the static background, enabling robust dynamic object segmentation. It outperforms the optical-flow guided segmentation, like MonST3R, and the model trained on dynamic mask labels, like DAS3R.

01.04.2025 15:24 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿ’กHumans naturally separate ego-motion from object-motion without dynamic labels. We observe that #DUSt3R has implicitly learned a similar mechanism, reflected in its attention layers.

01.04.2025 15:23 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Video thumbnail

๐ŸฆฃEasi3R: 4D Reconstruction Without Training!

Limited 4D datasets? Take it easy.

#Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps โ†’ make 4D reconstruction easier than ever!

๐Ÿ”—Page: easi3r.github.io

01.04.2025 15:21 ๐Ÿ‘ 22 ๐Ÿ” 3 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 4
Video thumbnail

How much 3D do visual foundation models (VFMs) know?

Previous work requires 3D data for probing โ†’ expensive to collect!

#Feat2GS @cvprconference.bsky.social 2025 - our idea is to read out 3D Gaussains from VFMs features, thus probe 3D with novel view synthesis.

๐Ÿ”—Page: fanegg.github.io/Feat2GS

31.03.2025 16:06 ๐Ÿ‘ 24 ๐Ÿ” 7 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1