MrNeRF (@janusch_patas)
2024-12-26 | โค๏ธ 149 | ๐ 23
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Contributions: โข Scene Representation via Sparse Gaussians: GaussTR introduces a Transformer-based architecture that models scenes as sparse 3D Gaussian representations in a feed-forward manner, substituting dense voxel-based modeling and reducing computational overhead.
โข Self-Supervised & Open-Vocabulary Occupancy Prediction: GaussTR attains 3D Gaussian representations aligned with knowledge from foundation models, facilitating self-supervised open-vocabulary 3D semantic occupancy prediction without requiring 3D annotations or 2D pseudo-labels.
โข State-of-the-Art Performance & Efficiency: GaussTR achieves 11.70 mIoU on the Occ3D-nuScenes dataset, outperforming previous methods by 18% while halving training time. This performance leap highlights the efficacy of our proposed foundation model alignment and representation sparsity in the advancement of 3D spatial understanding.
๋ฏธ๋์ด
![]()