Rohan Paul (@rohanpaul_ai)

2024-08-09 | ❤️ 311 | 🔁 46

LLM Basics - Binary Quantization 🔥

🧵 A thread - 1/n 👇

The concept itself isn’t new, but what’s reignited interest is the recent announcement from @cohere regarding their support for int8 and binary embeddings in their Cohere embed v3.

📌 First, in essence, embeddings are numerical representations of more complex objects, like text, images, audio, etc. Specifically, the objects are represented as n-dimensional vectors.

After transforming the complex objects, you can determine their similarity by calculating the similarity of the respective embeddings! This is crucial for many use cases: it serves as the backbone for recommendation systems, retrieval, one-shot or few-shot learning, outlier detection, similarity search, paraphrase detection, clustering, classification, and much more.

📌 Binary Quantization for embeddings

Unlike quantization in models where you reduce the precision of weights, quantization for embeddings refers to a post-processing step for the embeddings themselves. In particular, binary quantization refers to the conversion of the float32 values in an embedding to 1-bit values, resulting in a 32x reduction in memory and storage usage.

✨ Binary quantization example

Vector embeddings are usually generated by embedding models, such as Cohere’s embed v3, and a single vector embeddings will in the following form.

[0.056, -0.128, -0.029, 0.047, …, 0.135]

To quantize float32 embeddings to binary, we simply threshold normalized embeddings at 0

That is, because these embeddings have very small absolute numbers close to zero, you can turn them into a binary vector:

1: If the value is greater or equal to 0.

0: If the value is smaller than 0.

So that you get something like this.

[1, 0, 0, …, 1]

📚 세현's Vault

🌍 도메인

📄 Papers

llm-basics---binary-quantization-a-thread---1n-the-concept-itself-isnt-new-but

Rohan Paul (@rohanpaul_ai)

미디어

Tags

그래프 뷰

목차

백링크