DailyPapers (@HuggingPapers)

2025-10-13 | ❤️ 192 | 🔁 32


📷Thinking with Camera” just dropped on Hugging Face

A unified multimodal model, Puffin, introduces a novel paradigm for camera-centric understanding and generation by treating camera as language. It interprets and creates scenes from arbitrary viewpoints, enhancing spatial intelligence for tasks like world exploration.


Auto-generated bookmark

Tags

AI-ML VLM GenAI