Skip to content

Commit 9a6bb8a

Browse files
jmazanec15vagimelinatebower
authored
Adds section on product quantization for docs (opensearch-project#6926)
* Adds section on product quantization for docs Adds section in vector quantization docs for product quantization. In it, it contains tips for using it as well as memory estimations. Along with this, changed some formatting to make docs easier to write. Signed-off-by: John Mazanec <[email protected]> * Update knn-vector-quantization.md Fix formatting Signed-off-by: Melissa Vagi <[email protected]> * Update knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update knn-vector-quantization.md Define abbreviation on first mention Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: John Mazanec <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-index.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update knn-index.md Formatting and copyedits Signed-off-by: Melissa Vagi <[email protected]> * Update knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update knn-vector-quantization.md Address editorial feedback Signed-off-by: Melissa Vagi <[email protected]> --------- Signed-off-by: John Mazanec <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> Co-authored-by: Melissa Vagi <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
1 parent 758bb17 commit 9a6bb8a

File tree

2 files changed

+64
-22
lines changed

2 files changed

+64
-22
lines changed

_search-plugins/knn/knn-index.md

+9-15
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ PUT /test-index
4444

4545
## Lucene byte vector
4646

47-
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
47+
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
4848

4949
## SIMD optimization for the Faiss engine
5050

@@ -137,10 +137,7 @@ For more information about setting these parameters, refer to the [Faiss documen
137137

138138
#### IVF training requirements
139139

140-
The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the
141-
[Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there should be `nlist` training
142-
data points, but it is [recommended that you use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset).
143-
Training data can be composed of either the same data that is going to be ingested or a separate dataset.
140+
The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there are `nlist` training data points, but it is [recommended that you use more than this](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). Training data can be composed of either the same data that is going to be ingested or a separate dataset.
144141

145142
### Supported Lucene methods
146143

@@ -175,8 +172,7 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e
175172

176173
### Supported Faiss encoders
177174

178-
You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the
179-
`flat`, `pq`, and `sq` encoders in the Faiss library.
175+
You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the `flat`, `pq`, and `sq` encoders in the Faiss library.
180176

181177
The following example method definition specifies the `hnsw` method and a `pq` encoder:
182178

@@ -204,7 +200,7 @@ Encoder name | Requires training | Description
204200
:--- | :--- | :---
205201
`flat` (Default) | false | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint.
206202
`pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388).
207-
`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization).
203+
`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-16-bit-scalar-quantization).
208204

209205
#### PQ parameters
210206

@@ -314,21 +310,19 @@ The following example uses the `ivf` method with an `sq` encoder of type `fp16`:
314310

315311
### Choosing the right method
316312

317-
There are a lot of options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, (4) indexing latency.
313+
There are several options to choose from when building your `knn_vector` field. To determine the correct methods and parameters, you should first understand the requirements of your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, and (4) indexing latency.
318314

319-
If memory is not a concern, HNSW offers a very strong query latency/query quality tradeoff.
315+
If memory is not a concern, HNSW offers a strong query latency/query quality trade-off.
320316

321-
If you want to use less memory and index faster than HNSW, while maintaining similar query quality, you should evaluate IVF.
317+
If you want to use less memory and increase indexing speed as compared to HNSW while maintaining similar query quality, you should evaluate IVF.
322318

323319
If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop.
324320

325-
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
321+
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
326322

327323
### Memory estimation
328324

329-
In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates
330-
native library indexes to a portion of the remaining RAM. This portion's size is determined by
331-
the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%.
325+
In a typical OpenSearch cluster, a certain portion of RAM is reserved for the JVM heap. The k-NN plugin allocates native library indexes to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set to 50%.
332326

333327
Having a replica doubles the total number of vectors.
334328
{: .note }

_search-plugins/knn/knn-vector-quantization.md

+55-7
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,17 @@ has_math: true
1212

1313
By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization.
1414

15+
OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ).
16+
1517
## Lucene byte vector
1618

1719
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
1820

19-
## Faiss scalar quantization
21+
## Faiss 16-bit scalar quantization
2022

21-
Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput.
23+
Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index.
24+
25+
At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput.
2226

2327
SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies.
2428
{: .warning}
@@ -62,7 +66,9 @@ PUT /test-index
6266

6367
Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters).
6468

65-
The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`.
69+
The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected.
70+
71+
When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`.
6672

6773
We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall.
6874
{: .note}
@@ -105,7 +111,7 @@ PUT /test-index
105111
```
106112
{% include copy-curl.html %}
107113

108-
During ingestion, make sure each dimension of the vector is in the supported range ([-65504.0, 65504.0]):
114+
During ingestion, make sure each vector dimension is in the supported range ([-65504.0, 65504.0]).
109115

110116
```json
111117
PUT test-index/_doc/1
@@ -115,7 +121,7 @@ PUT test-index/_doc/1
115121
```
116122
{% include copy-curl.html %}
117123

118-
During querying, there is no range limitation for the query vector:
124+
During querying, the query vector has no range limitation:
119125

120126
```json
121127
GET test-index/_search
@@ -133,13 +139,13 @@ GET test-index/_search
133139
```
134140
{% include copy-curl.html %}
135141

136-
## Memory estimation
142+
### Memory estimation
137143

138144
In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require.
139145

140146
#### HNSW memory estimation
141147

142-
The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector.
148+
The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector.
143149

144150
As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
145151

@@ -157,3 +163,45 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an
157163
1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB
158164
```
159165

166+
## Faiss product quantization
167+
168+
PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression as compared to byte or scalar quantization. PQ works by separating vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector is `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or _IVF_ approximate nearest neighbor (ANN) algorithms.
169+
170+
### Using Faiss product quantization
171+
172+
To minimize loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched.
173+
174+
The product quantizer is trained by running k-means clustering on a set of training vectors for each subvector space and extracts the centroids to be used for encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested.
175+
176+
In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm is used and how much data will be stored in the index. For IVF-based indexes, a recommended number of training vectors is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a recommended number is `2^code_size*1000`. See the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more information about the methodology used to calculate these figures.
177+
178+
For PQ, both _m_ and _code_size_ need to be selected. _m_ determines the number of subvectors into which vectors should be split for separate encoding. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines the number of bits used to encode each subvector. In general, we recommend a setting of `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall.
179+
180+
For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial.
181+
182+
### Memory estimation
183+
184+
While PQ is meant to represent individual vectors with `m*code_size` bits, in reality, the indexes consume more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures.
185+
186+
Some of the memory formulas depend on the number of segments present. This is not typically known beforehand, but a recommended default value is 300.
187+
{: .note}
188+
189+
#### HNSW memory estimation
190+
191+
The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes.
192+
193+
As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows:
194+
195+
```bash
196+
1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB
197+
```
198+
199+
#### IVF memory estimation
200+
201+
The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes.
202+
203+
For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows:
204+
205+
```bash
206+
1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB
207+
```

0 commit comments

Comments
 (0)