Skip to content

Commit ddbfa42

Browse files
Address Review Comments
Signed-off-by: Naveen Tatikonda <[email protected]>
1 parent 5c88af3 commit ddbfa42

File tree

5 files changed

+119
-198
lines changed

5 files changed

+119
-198
lines changed

_field-types/supported-field-types/knn-vector.md

Lines changed: 111 additions & 186 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,9 @@ However, if you intend to use Painless scripting or a k-NN score script, you onl
8383
}
8484
```
8585

86-
## Lucene byte vector
86+
## Byte vector
8787

88-
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
88+
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` and `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
8989

9090
Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine.
9191
{: .note}
@@ -94,11 +94,17 @@ In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/mai
9494

9595
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
9696
{: .important}
97-
97+
98+
When using `byte` vectors with `faiss` engine, it is recommended to use with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), which helps to significantly reduce search latencies and improve indexing throughput.
99+
{: .important}
100+
98101
Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
99102

100103
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
101104

105+
### Example: HNSW
106+
107+
Here is an example to create a byte vector index with the Lucene engine and HNSW algorithm:
102108
```json
103109
PUT test-index
104110
{
@@ -166,189 +172,6 @@ GET test-index/_search
166172
```
167173
{% include copy-curl.html %}
168174

169-
### Quantization techniques
170-
171-
If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
172-
173-
#### Scalar quantization for the L2 space type
174-
175-
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$).
176-
177-
```python
178-
# Random dataset (Example to create a random dataset)
179-
dataset = np.random.uniform(-300, 300, (100, 10))
180-
# Random query set (Example to create a random queryset)
181-
queryset = np.random.uniform(-350, 350, (100, 10))
182-
# Number of values
183-
B = 256
184-
185-
# INDEXING:
186-
# Get min and max
187-
dataset_min = np.min(dataset)
188-
dataset_max = np.max(dataset)
189-
# Shift coordinates to be non-negative
190-
dataset -= dataset_min
191-
# Normalize into [0, 1]
192-
dataset *= 1. / (dataset_max - dataset_min)
193-
# Bucket into 256 values
194-
dataset = np.floor(dataset * (B - 1)) - int(B / 2)
195-
196-
# QUERYING:
197-
# Clip (if queryset range is out of datset range)
198-
queryset = queryset.clip(dataset_min, dataset_max)
199-
# Shift coordinates to be non-negative
200-
queryset -= dataset_min
201-
# Normalize
202-
queryset *= 1. / (dataset_max - dataset_min)
203-
# Bucket into 256 values
204-
queryset = np.floor(queryset * (B - 1)) - int(B / 2)
205-
```
206-
{% include copy.html %}
207-
208-
#### Scalar quantization for the cosine similarity space type
209-
210-
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$).
211-
212-
The following pseudocode is for positive numbers:
213-
214-
```python
215-
# For Positive Numbers
216-
217-
# INDEXING and QUERYING:
218-
219-
# Get Max of train dataset
220-
max = np.max(dataset)
221-
min = 0
222-
B = 127
223-
224-
# Normalize into [0,1]
225-
val = (val - min) / (max - min)
226-
val = (val * B)
227-
228-
# Get int and fraction values
229-
int_part = floor(val)
230-
frac_part = val - int_part
231-
232-
if 0.5 < frac_part:
233-
bval = int_part + 1
234-
else:
235-
bval = int_part
236-
237-
return Byte(bval)
238-
```
239-
{% include copy.html %}
240-
241-
The following pseudocode is for negative numbers:
242-
243-
```python
244-
# For Negative Numbers
245-
246-
# INDEXING and QUERYING:
247-
248-
# Get Min of train dataset
249-
min = 0
250-
max = -np.min(dataset)
251-
B = 128
252-
253-
# Normalize into [0,1]
254-
val = (val - min) / (max - min)
255-
val = (val * B)
256-
257-
# Get int and fraction values
258-
int_part = floor(var)
259-
frac_part = val - int_part
260-
261-
if 0.5 < frac_part:
262-
bval = int_part + 1
263-
else:
264-
bval = int_part
265-
266-
return Byte(bval)
267-
```
268-
{% include copy.html %}
269-
270-
## Faiss byte vector
271-
272-
Faiss engine is recommended for use cases that requires ingestion on a large scale. But, for these large scale workloads using the default `float` vectors requires a lot of memory usage as each dimension is 4 bytes. If you want to reduce this memory and storage requirements,
273-
you can use `byte` vectors with the `faiss` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
274-
275-
Faiss directly doesn't support byte datatype to store byte vectors. To achieve this functionality we are using a scalar quantizer (SQ8_direct_signed) which accepts float vectors in
276-
8-bit signed integer range and encodes them as byte sized vectors. These quantized byte sized vectors are stored in a k-NN index which reduces the memory footprint by a factor of 4.
277-
When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQ8_direct_signed quantization can also significantly reduce search latencies and improve indexing throughput.
278-
279-
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
280-
{: .important}
281-
282-
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index.
283-
284-
### Example: HNSW
285-
286-
Here is an example to create a byte vector index with the Faiss engine and HNSW algorithm:
287-
```json
288-
PUT test-index
289-
{
290-
"settings": {
291-
"index": {
292-
"knn": true
293-
}
294-
},
295-
"mappings": {
296-
"properties": {
297-
"my_vector": {
298-
"type": "knn_vector",
299-
"dimension": 2,
300-
"data_type": "byte",
301-
"method": {
302-
"name": "hnsw",
303-
"space_type": "l2",
304-
"engine": "faiss",
305-
"parameters": {
306-
"ef_construction": 128,
307-
"m": 24
308-
}
309-
}
310-
}
311-
}
312-
}
313-
}
314-
315-
```
316-
{% include copy-curl.html %}
317-
318-
Then ingest documents as usual. But, make sure each dimension in the vector is in the supported [-128, 127] range:
319-
```json
320-
PUT test-index/_doc/1
321-
{
322-
"my_vector": [-126, 28]
323-
}
324-
```
325-
{% include copy-curl.html %}
326-
327-
```json
328-
PUT test-index/_doc/2
329-
{
330-
"my_vector": [100, -128]
331-
}
332-
```
333-
{% include copy-curl.html %}
334-
335-
When querying, be sure to use a byte vector:
336-
```json
337-
GET test-index/_search
338-
{
339-
"size": 2,
340-
"query": {
341-
"knn": {
342-
"my_vector": {
343-
"vector": [26, -120],
344-
"k": 2
345-
}
346-
}
347-
}
348-
}
349-
```
350-
{% include copy-curl.html %}
351-
352175
### Example: IVF
353176

354177
The IVF method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
@@ -499,6 +322,108 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an
499322
1.1 * ((256 * 1,000,000) + (4 * 128 * 256)) ~= 0.27 GB
500323
```
501324

325+
326+
### Quantization techniques
327+
328+
If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
329+
330+
#### Scalar quantization for the L2 space type
331+
332+
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$).
333+
334+
```python
335+
# Random dataset (Example to create a random dataset)
336+
dataset = np.random.uniform(-300, 300, (100, 10))
337+
# Random query set (Example to create a random queryset)
338+
queryset = np.random.uniform(-350, 350, (100, 10))
339+
# Number of values
340+
B = 256
341+
342+
# INDEXING:
343+
# Get min and max
344+
dataset_min = np.min(dataset)
345+
dataset_max = np.max(dataset)
346+
# Shift coordinates to be non-negative
347+
dataset -= dataset_min
348+
# Normalize into [0, 1]
349+
dataset *= 1. / (dataset_max - dataset_min)
350+
# Bucket into 256 values
351+
dataset = np.floor(dataset * (B - 1)) - int(B / 2)
352+
353+
# QUERYING:
354+
# Clip (if queryset range is out of datset range)
355+
queryset = queryset.clip(dataset_min, dataset_max)
356+
# Shift coordinates to be non-negative
357+
queryset -= dataset_min
358+
# Normalize
359+
queryset *= 1. / (dataset_max - dataset_min)
360+
# Bucket into 256 values
361+
queryset = np.floor(queryset * (B - 1)) - int(B / 2)
362+
```
363+
{% include copy.html %}
364+
365+
#### Scalar quantization for the cosine similarity space type
366+
367+
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$).
368+
369+
The following pseudocode is for positive numbers:
370+
371+
```python
372+
# For Positive Numbers
373+
374+
# INDEXING and QUERYING:
375+
376+
# Get Max of train dataset
377+
max = np.max(dataset)
378+
min = 0
379+
B = 127
380+
381+
# Normalize into [0,1]
382+
val = (val - min) / (max - min)
383+
val = (val * B)
384+
385+
# Get int and fraction values
386+
int_part = floor(val)
387+
frac_part = val - int_part
388+
389+
if 0.5 < frac_part:
390+
bval = int_part + 1
391+
else:
392+
bval = int_part
393+
394+
return Byte(bval)
395+
```
396+
{% include copy.html %}
397+
398+
The following pseudocode is for negative numbers:
399+
400+
```python
401+
# For Negative Numbers
402+
403+
# INDEXING and QUERYING:
404+
405+
# Get Min of train dataset
406+
min = 0
407+
max = -np.min(dataset)
408+
B = 128
409+
410+
# Normalize into [0,1]
411+
val = (val - min) / (max - min)
412+
val = (val * B)
413+
414+
# Get int and fraction values
415+
int_part = floor(var)
416+
frac_part = val - int_part
417+
418+
if 0.5 < frac_part:
419+
bval = int_part + 1
420+
else:
421+
bval = int_part
422+
423+
return Byte(bval)
424+
```
425+
{% include copy.html %}
426+
502427
## Binary k-NN vectors
503428

504429
You can reduce memory costs by a factor of 32 by switching from float to binary vectors.

_ml-commons-plugin/tutorials/semantic-search-byte-vectors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ nav_order: 10
77

88
# Semantic search using byte-quantized vectors
99

10-
This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector).
10+
This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vector).
1111

1212
The Cohere Embed v3 model supports several `embedding_types`. For this tutorial, you'll use the `INT8` type to encode byte-quantized vectors.
1313

_search-plugins/knn/knn-index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ PUT /test-index
4141
```
4242
{% include copy-curl.html %}
4343

44-
## Lucene byte vector
44+
## Byte vector
4545

46-
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
46+
Starting with k-NN plugin version 2.17, you can use `byte` vectors with the `faiss` and `lucene` engine to reduce the amount of memory and storage space needed. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#Byte-vector).
4747

4848
## Binary vector
4949

@@ -324,7 +324,7 @@ If you want to use less memory and increase indexing speed as compared to HNSW w
324324

325325
If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop.
326326

327-
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
327+
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
328328

329329
### Memory estimation
330330

_search-plugins/knn/knn-vector-quantization.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,13 @@ By default, the k-NN plugin supports the indexing and querying of vectors of typ
1313

1414
OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ).
1515

16-
## Faiss byte vector
16+
## Byte vector
1717

18-
Starting with version 2.17, the k-NN plugin supports `byte` vectors with the Faiss engine in order to reduce the amount of required memory. For more information, see [Faiss byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#faiss-byte-vector).
19-
20-
## Lucene byte vector
21-
22-
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucene engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
18+
Starting with version 2.17, the k-NN plugin supports `byte` vectors with the `faiss` and `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector).
2319

2420
## Lucene scalar quantization
2521

26-
Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors.
22+
Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors.
2723

2824
Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.
2925

_search-plugins/vector-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ PUT test-index
5757

5858
You must designate the field that will store vectors as a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field type. OpenSearch supports vectors of up to 16,000 dimensions, each of which is represented as a 32-bit or 16-bit float.
5959

60-
To save storage space, you can use `byte` or `binary` vectors. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
60+
To save storage space, you can use `byte` or `binary` vectors. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
6161

6262
### k-NN vector search
6363

0 commit comments

Comments
 (0)