Pablo Vela (@pablovelagomez1)

2025-10-02 | โค๏ธ 84 | ๐Ÿ” 9


Got some new out-of-distribution data to test with, tracker is looking ๐Ÿ”ฅ. Had to disable the mano optimization for now as Iโ€™ve found some wonky bugs, but will have those solved soon https://x.com/pablovelagomez1/status/1973857488283054470/video/1


Auto-generated bookmark

์ธ์šฉ ํŠธ์œ—

Pablo Vela (@pablovelagomez1)

Itโ€™s finally done, Iโ€™ve finished ripping out my full-body pipeline and replaced it with a hands-only version. Critical to make it work in a lot more scenarios! Iโ€™ve visualized the final predictions with @rerundotio!

I want to emphasize that these are not the ground-truth values provided by the wonderful HOCap dataset, but rather from my pipeline that was written from the ground up!

For context, it consists of 4 parts

  1. Exo/Ego camera estimation
  2. Hand Shape Calibration
  3. Per View 2D keypoint estimation
  4. Hand Pose Optimization

At the end of it all, I have a pipeline where you input synchronized videos and this outputs full tracked per-view 2D keypoints, bounding boxes, 3D keypoints, MANO joint angles + hand shape!

Really happy with how it looks so far, but this is far from ideal.

  1. Not even close to real time, this 30-second 8-view sequence took nearly 5 minutes to process on my 5090 GPU
  2. 8 views is WAY too many and unscalable, Iโ€™m convinced this can be done with far fewer (2 exo + 1 stereo ego)
  3. Interacting hands causes lots of issues, and the pipeline is very fragile when thereโ€™s no clear delineation between hands

Still, Iโ€™m quite happy with how itโ€™s going so far. Currently, I have a reasonable set of datasets to validate, a performant baseline, and an annotation app to correct inaccurate predictions.

From here, the focus will be more on the egocentric side!

์›๋ณธ ํŠธ์œ—

๐ŸŽฌ ์˜์ƒ

Tags

Dev-Tools