Rohan Paul (@rohanpaul_ai)

2024-08-09 | โค๏ธ 311 | ๐Ÿ” 46


LLM Basics - Binary Quantization ๐Ÿ”ฅ

๐Ÿงต A thread - 1/n ๐Ÿ‘‡

The concept itself isnโ€™t new, but whatโ€™s reignited interest is the recent announcement from @cohere regarding their support for int8 and binary embeddings in their Cohere embed v3.

๐Ÿ“Œ First, in essence, embeddings are numerical representations of more complex objects, like text, images, audio, etc. Specifically, the objects are represented as n-dimensional vectors.

After transforming the complex objects, you can determine their similarity by calculating the similarity of the respective embeddings! This is crucial for many use cases: it serves as the backbone for recommendation systems, retrieval, one-shot or few-shot learning, outlier detection, similarity search, paraphrase detection, clustering, classification, and much more.


๐Ÿ“Œ Binary Quantization for embeddings

Unlike quantization in models where you reduce the precision of weights, quantization for embeddings refers to a post-processing step for the embeddings themselves. In particular, binary quantization refers to the conversion of the float32 values in an embedding to 1-bit values, resulting in a 32x reduction in memory and storage usage.


โœจ Binary quantization example

Vector embeddings are usually generated by embedding models, such as Cohereโ€™s embed v3, and a single vector embeddings will in the following form.

[0.056, -0.128, -0.029, 0.047, โ€ฆ, 0.135]

To quantize float32 embeddings to binary, we simply threshold normalized embeddings at 0

That is, because these embeddings have very small absolute numbers close to zero, you can turn them into a binary vector:

1: If the value is greater or equal to 0.

0: If the value is smaller than 0.

So that you get something like this.

[1, 0, 0, โ€ฆ, 1]

๋ฏธ๋””์–ด

photo


Tags

domain-llm