Ayush Jain (@ayushjain1144)
2025-04-23 | โค๏ธ 136 | ๐ 28
1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMsโdue to the lack of large-scale 3D data and pre-trained 3D encoders.
We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. https://univlg.github.io/ https://x.com/ayushjain1144/status/1915087087486828854/video/1
๐ ์๋ณธ ๋งํฌ
๋ฏธ๋์ด
![]()
์์ฝ
๋๊ท๋ชจ 3D ๋ฐ์ดํฐยท์ฌ์ ํ์ต 3D ์ธ์ฝ๋ ๋ถ์กฑ์ผ๋ก 2D VLM์ ์์กดํ๋ ํ๊ณ๋ฅผ ๊ฒจ๋ฅํด, 2D์ 3D๋ฅผ ํตํฉํ VLM(UniVLG)์ ์ ์ํฉ๋๋ค. 2D ๋ฐ์ดํฐ์ ์ค์ผ์ผ์ ํ์ฉํด 3D ์ฅ๋ฉด ์ดํด ์ฑ๋ฅ์ ๋์ด๋ ๊ฒ์ด ํต์ฌ์ ๋๋ค.
๐ Related
Auto-generated - needs manual review
Tags
domain-vision-3d domain-robotics domain-vlm domain-dev-tools domain-visionos