Bilawal Sidhu (@bilawalsidhu)
2025-02-27 | โค๏ธ 7108 | ๐ 835
Wow. Recreating the Shawshank Redemption prison in 3D from a single video, in real time (!)
Just read the MASt3R-SLAM paper and itโs pretty neat. These folks basically built a real-time dense SLAM system on top of MASt3R, which is a transformer-based neural network that can do 3d reconstruction and localization from uncalibrated image pairs.
The cool part is they donโt need a fixed camera model โ it just works with arbitrary cameras โ think different focal lengths, sensor sizes, even handling zooming in video (FMV drone video anyone?!). If youโve done photogrammetry or played with NeRFs you know that is a HUGE deal.
Theyโve solved some tricky problems like efficient point matching and tracking, plus theyโve figured out how to fuse point clouds and handle loop closures in real-time.
Their system runs at about 15 FPS on a 4090 and produces both camera poses and dense geometry. When they know the camera calibration, they get SOTA results across several benchmarks, but even without calibration, they still perform well.
Whatโs interesting is the approach โ most recent SLAM work has built on DROID-SLAMโs architecture, but these folks went a different direction by leveraging a strong 3D reconstruction prior. Seems to give them more coherent geometry, which makes sense since thatโs what MASt3R was designed for.
For anyone who cares about monocular SLAM and 3D reconstruction, this feels like a significant step toward plug-and-play dense SLAM without calibration headaches โ perfect for drones, robots, AR/VR โ the works!
๋ฏธ๋์ด
![]()
๐ Related
Auto-generated - needs manual review