Xingang Pan (@XingangP)
2025-12-12 | โค๏ธ 141 | ๐ 25
Can video generative models exhibit visuospatial intelligence? ๐ค
Introducing Video4Spatial โ a video-only framework that tackles spatial tasks.
With just video context, our model can: ๐ Ground objects by planning geometry-consistent paths ๐ธ Follow camera-pose instructions for scene navigation ๐ Generalize to long contexts & unseen outdoor scenes
A step toward video models as visual-spatial reasoners.
Project: https://xizaoqu.github.io/video4spatial/ arXiv: https://arxiv.org/pdf/2512.03040
๐ ์๋ณธ ๋งํฌ
๋ฏธ๋์ด
![]()