ViPE: Video Pose Engine for 3D Geometric Perception

Contributions: โ€ข A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos.

โ€ข A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work.

โ€ข A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.