MrNeRF (@janusch_patas)

2024-12-26 | โค๏ธ 149 | ๐Ÿ” 23


GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Contributions: โ€ข Scene Representation via Sparse Gaussians: GaussTR introduces a Transformer-based architecture that models scenes as sparse 3D Gaussian representations in a feed-forward manner, substituting dense voxel-based modeling and reducing computational overhead.

โ€ข Self-Supervised & Open-Vocabulary Occupancy Prediction: GaussTR attains 3D Gaussian representations aligned with knowledge from foundation models, facilitating self-supervised open-vocabulary 3D semantic occupancy prediction without requiring 3D annotations or 2D pseudo-labels.

โ€ข State-of-the-Art Performance & Efficiency: GaussTR achieves 11.70 mIoU on the Occ3D-nuScenes dataset, outperforming previous methods by 18% while halving training time. This performance leap highlights the efficacy of our proposed foundation model alignment and representation sparsity in the advancement of 3D spatial understanding.

๋ฏธ๋””์–ด

video


Tags

domain-vision-3d domain-ai-ml