Rohan Paul (@rohanpaul_ai)

2024-09-04 | โค๏ธ 453 | ๐Ÿ” 76


Multimodal LLMs are just superb to play with.

Hacking around with Qwen2 VL : and its great for

๐Ÿ“ท OCR ๐Ÿ“ Image-to-markdown conversion ๐Ÿท๏ธ Classification ๐Ÿ” Object tagging ๐Ÿ—๏ธ Keyword generation


Naive Dynamic Resolution and Multimodal Rotary Position Embedding (M-ROPE) is just WORKING

๋ฏธ๋””์–ด

image


Tags

domain-ai-ml domain-genai domain-llm domain-vlm