Original Tweet


IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction https://lifuguan.github.io/IGGT_official/ Humans naturally perceive both geometric structure and semantic content of 3D worlds, but achieving โ€œthe best of both worldsโ€ has been a grand challenge for AI. Traditional methods decouple 3D reconstruction (low-level geometry) from spatial understanding (high-level semantics), leading to error accumulation and poor generalization. Meanwhile, newer methods attempt to โ€œlockโ€ 3D models with specific Vision-Language Models (VLMs), which not only limits the modelโ€™s perception capabilities (e.g., inability to distinguish between different instances of the same class) but also hinders extensibility to stronger downstream tasks.

Now, iGGT presents a revolutionary solution. NTU in collaboration with StepFun proposes iGGT (Instance-Grounded Geometry Transformer), an innovative end-to-end large unified Transformer that, for the first time, integrates spatial reconstruction with instance-level contextual understanding.

๐Ÿ”— ์›๋ณธ ๋งํฌ

๋ฏธ๋””์–ด

image


Tags

3D-Vision AI-ML