Xiaolong Wang (@xiaolonw)

2023-12-16 | โค๏ธ 29 | ๐Ÿ” 6


Can LLM understand and reason explicit object coordinates?

Introducing PixelLLM, which can take object coordinates as inputs and outputs. This allows LLM to directly perform all detection/segmentation tasks with dense descriptions.

์ธ์šฉ ํŠธ์œ—

Jiarui Xu (@Jerry_XU_Jiarui): GPT4-V can describe the location via text, but canโ€™t accurately output the coordinate of each word.

Introducing: Pixel Aligned Language Models. It generates image captions along with the aligned pixel coordinates of the image. https://arxiv.org/abs/2312.09237

(1/n https://x.com/Jerry_XU_Jiarui/status/1735881901926498310/video/1


Tags

domain-llm