Rohan Paul (@rohanpaul_ai)

2024-06-12 | ❤️ 739 | 🔁 137


A new framework runs Mixtral 8x7B at 11 tokens/s on a mobile phone.🤯

📌 which is up to 22 times faster than other state-of-the-art frameworks

PowerInfer-2, the highly optimized inference framework designed specifically for smartphones.

Even with 7B models, by placing just 50% of the FFN weights on the phones, PowerInfer-2 still maintains state-of-the-art speed!

미디어

video


Tags

domain-ai-ml domain-xr domain-dev-tools