Shiqi Chen (@shiqi_chen17)
2025-05-02 | โค๏ธ 295 | ๐ 41
๐๐ฅ Thrilled to announce our ICML25 paper: โWhy Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areasโ!
We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. ๐๐
Paper: https://arxiv.org/pdf/2503.01773 Code: https://github.com/shiqichen17/AdaptVis Website: https://shiqichen17.github.io/AdaptVis/
๐ ์๋ณธ ๋งํฌ
- https://arxiv.org/pdf/2503.01773
- https://github.com/shiqichen17/AdaptVis
- https://shiqichen17.github.io/AdaptVis/
๋ฏธ๋์ด
![]()
๐ Related
Auto-generated - needs manual review