Xingang Pan (@XingangP)

2025-12-12 | ❤️ 141 | 🔁 25

Can video generative models exhibit visuospatial intelligence? 🤔

Introducing Video4Spatial — a video-only framework that tackles spatial tasks.

With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes

A step toward video models as visual-spatial reasoners.

Project: https://xizaoqu.github.io/video4spatial/ arXiv: https://arxiv.org/pdf/2512.03040

🔗 원본 링크

https://xizaoqu.github.io/video4spatial/
https://arxiv.org/pdf/2512.03040

📚 세현's Vault

🌍 도메인

📄 Papers

can-video-generative-models-exhibit-visuospatial-intelligenc

Xingang Pan (@XingangP)

🔗 원본 링크

미디어

Tags

그래프 뷰

목차

백링크