Xiaolong Wang (@xiaolonw)

2023-12-16 | ❤️ 29 | 🔁 6

Can LLM understand and reason explicit object coordinates?

Introducing PixelLLM, which can take object coordinates as inputs and outputs. This allows LLM to directly perform all detection/segmentation tasks with dense descriptions.

인용 트윗

Jiarui Xu (@Jerry_XU_Jiarui): GPT4-V can describe the location via text, but can’t accurately output the coordinate of each word.

Introducing: Pixel Aligned Language Models. It generates image captions along with the aligned pixel coordinates of the image. https://arxiv.org/abs/2312.09237

(1/n https://x.com/Jerry_XU_Jiarui/status/1735881901926498310/video/1

📚 세현's Vault

🌍 도메인

📄 Papers

can-llm-understand-and-reason-explicit-object-coordinates-introducing-pixelllm-w

Xiaolong Wang (@xiaolonw)

인용 트윗

Tags

그래프 뷰

목차

백링크