Ilir Aliu (@IlirAliu_)
2025-11-26 | ❤️ 134 | 🔁 17
Most datasets for hand object manipulation are slow to build, expensive to capture, or too small to be useful… not this:
HO Cap feels different. A simple idea done well.
They built an 8 camera RealSense setup with an Azure Kinect on top. Users wear a HoloLens so they also get a first person view. Everything is calibrated so the output is consistent.
The pipeline is semi automatic: •BundleSDF reconstructs the object mesh •FoundationPose gives the initial object pose •MediaPipe gives the hand pose •A joint SDF based optimization cleans everything up
The result is 64 objects and 64 videos with clean poses, masks, and first person data. No expensive mocap rigs. No custom gloves. No full manual labeling.
The rough edges are clear. Some objects do not reconstruct well. MediaPipe sometimes drops joints. Small objects can break the pose estimation. Failed videos are removed.
But the direction matters. This type of data is what dexterous robot learning needs right now. These actions are still very hard for current systems.
Thanks for sharing, @YuXiang_IRVL!!!
Project page: https://irvlutd.github.io/HOCap/
Dataset toolbox: https://github.com/IRVLUTD/HO-Cap
—-
Weekly robotics and AI insights. Subscribe free: https://t.co/dsa6wcvq6n
🔗 원본 링크
미디어
![]()