Pablo Vela (@pablovelagomez1)

2025-05-09 | โค๏ธ 485 | ๐Ÿ” 66


MVP of Multiview Video โ†’ Camera parameters + 3D keypoints. Visualized with @rerundotio

The basic pipeline as of right now looks like this:

  1. Capture ๐Ÿ”ด โ€“ Using 4 iPhones and an Insta360ย Go. iPhone videos are captured via Finalย Cutย Pro Multicam for easy sync and the exocentric view; the Insta360ย Go is used for the egocentric view.
  2. Sync ๐Ÿ•’ โ€“ Custom Gradio app using two @rerundotio viewers and callbacks for easily aligning frame timestamps so the ego and exo views are aligned.
  3. Calibrate ๐ŸŽฏ โ€“ Use VGGT from @jianyuan_wang and @AIatMeta to get intrinsics/extrinsics for sparse cameras.
  4. Estimateโ€ฏ3D ๐Ÿ•บ โ€“ Use RTMLib wholeโ€‘body keypoint estimator on each frame, then triangulate in 3D.

Whatโ€™s missing?

  1. No temporal coherence: Iโ€™m estimating keypoints one frame at a time and one camera at a time. This leads to a lot of jittering. For now, I plan on adding a Oneโ€ฏEuro Filter to help with jittering. Long term, Iโ€™d want to train a multiview keypoint estimator
  2. Kinematic fitting is still missing; this is my next goal. The output will be joint angles, as explored in my previous posts.
  3. Missing dense point cloud: VGGT seems to fail for me here. Iโ€™m looking to explore using MPโ€‘SFM as a method for generating dense multiview depth maps + normals (plus it has a friendlier license compared to VGGT).
  4. Eventually, creation of 4D Gaussian splatting using something akin to DNโ€‘splatterโ€”my longโ€‘term goal is a data engine that provides poses/depths/splats/keypoints/etc.

๋ฏธ๋””์–ด

video


Auto-generated - needs manual review

์ธ์šฉ ํŠธ์œ—

Pablo Vela (@pablovelagomez1)

Synchronization โœ… Calibration with VGGT โœ… Next up is hand tracking + kinematic fitting

Almost there ๐Ÿ˜ฎโ€๐Ÿ’จ https://t.co/de7fy0DYLr

์›๋ณธ ํŠธ์œ—

๐ŸŽฌ ์˜์ƒ ๐ŸŽฌ ์˜์ƒ

Tags

domain-vision-3d domain-visionos