Bilawal Sidhu (@bilawalsidhu)

2025-02-27 | โค๏ธ 7108 | ๐Ÿ” 835


Wow. Recreating the Shawshank Redemption prison in 3D from a single video, in real time (!)

Just read the MASt3R-SLAM paper and itโ€™s pretty neat. These folks basically built a real-time dense SLAM system on top of MASt3R, which is a transformer-based neural network that can do 3d reconstruction and localization from uncalibrated image pairs.

The cool part is they donโ€™t need a fixed camera model โ€” it just works with arbitrary cameras โ€” think different focal lengths, sensor sizes, even handling zooming in video (FMV drone video anyone?!). If youโ€™ve done photogrammetry or played with NeRFs you know that is a HUGE deal.

Theyโ€™ve solved some tricky problems like efficient point matching and tracking, plus theyโ€™ve figured out how to fuse point clouds and handle loop closures in real-time.

Their system runs at about 15 FPS on a 4090 and produces both camera poses and dense geometry. When they know the camera calibration, they get SOTA results across several benchmarks, but even without calibration, they still perform well.

Whatโ€™s interesting is the approach โ€” most recent SLAM work has built on DROID-SLAMโ€™s architecture, but these folks went a different direction by leveraging a strong 3D reconstruction prior. Seems to give them more coherent geometry, which makes sense since thatโ€™s what MASt3R was designed for.

For anyone who cares about monocular SLAM and 3D reconstruction, this feels like a significant step toward plug-and-play dense SLAM without calibration headaches โ€” perfect for drones, robots, AR/VR โ€” the works!

๋ฏธ๋””์–ด

video


Auto-generated - needs manual review

Tags

domain-vision-3d domain-robotics domain-llm domain-visionos