Rohan Paul (@rohanpaul_ai)
2024-09-04 | โค๏ธ 453 | ๐ 76
Multimodal LLMs are just superb to play with.
Hacking around with Qwen2 VL : and its great for
๐ท OCR ๐ Image-to-markdown conversion ๐ท๏ธ Classification ๐ Object tagging ๐๏ธ Keyword generation
Naive Dynamic Resolution and Multimodal Rotary Position Embedding (M-ROPE) is just WORKING
๋ฏธ๋์ด
