You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -83,9 +83,9 @@ However, if you intend to use Painless scripting or a k-NN score script, you onl
83
83
}
84
84
```
85
85
86
-
## Lucene byte vector
86
+
## Byte vector
87
87
88
-
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
88
+
By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` and `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
89
89
90
90
Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine.
91
91
{: .note}
@@ -94,11 +94,17 @@ In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/mai
94
94
95
95
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
96
96
{: .important}
97
-
97
+
98
+
When using `byte` vectors with `faiss` engine, it is recommended to use with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), which helps to significantly reduce search latencies and improve indexing throughput.
99
+
{: .important}
100
+
98
101
Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
99
102
100
103
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
101
104
105
+
### Example: HNSW
106
+
107
+
Here is an example to create a byte vector index with the Lucene engine and HNSW algorithm:
102
108
```json
103
109
PUT test-index
104
110
{
@@ -166,189 +172,6 @@ GET test-index/_search
166
172
```
167
173
{% include copy-curl.html %}
168
174
169
-
### Quantization techniques
170
-
171
-
If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
172
-
173
-
#### Scalar quantization for the L2 space type
174
-
175
-
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$).
176
-
177
-
```python
178
-
# Random dataset (Example to create a random dataset)
179
-
dataset = np.random.uniform(-300, 300, (100, 10))
180
-
# Random query set (Example to create a random queryset)
#### Scalar quantization for the cosine similarity space type
209
-
210
-
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$).
211
-
212
-
The following pseudocode is for positive numbers:
213
-
214
-
```python
215
-
# For Positive Numbers
216
-
217
-
# INDEXING and QUERYING:
218
-
219
-
# Get Max of train dataset
220
-
max= np.max(dataset)
221
-
min=0
222
-
B =127
223
-
224
-
# Normalize into [0,1]
225
-
val = (val -min) / (max-min)
226
-
val = (val * B)
227
-
228
-
# Get int and fraction values
229
-
int_part = floor(val)
230
-
frac_part = val - int_part
231
-
232
-
if0.5< frac_part:
233
-
bval = int_part +1
234
-
else:
235
-
bval = int_part
236
-
237
-
return Byte(bval)
238
-
```
239
-
{% include copy.html %}
240
-
241
-
The following pseudocode is for negative numbers:
242
-
243
-
```python
244
-
# For Negative Numbers
245
-
246
-
# INDEXING and QUERYING:
247
-
248
-
# Get Min of train dataset
249
-
min=0
250
-
max=-np.min(dataset)
251
-
B =128
252
-
253
-
# Normalize into [0,1]
254
-
val = (val -min) / (max-min)
255
-
val = (val * B)
256
-
257
-
# Get int and fraction values
258
-
int_part = floor(var)
259
-
frac_part = val - int_part
260
-
261
-
if0.5< frac_part:
262
-
bval = int_part +1
263
-
else:
264
-
bval = int_part
265
-
266
-
return Byte(bval)
267
-
```
268
-
{% include copy.html %}
269
-
270
-
## Faiss byte vector
271
-
272
-
Faiss engine is recommended for use cases that requires ingestion on a large scale. But, for these large scale workloads using the default `float` vectors requires a lot of memory usage as each dimension is 4 bytes. If you want to reduce this memory and storage requirements,
273
-
you can use `byte` vectors with the `faiss` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
274
-
275
-
Faiss directly doesn't support byte datatype to store byte vectors. To achieve this functionality we are using a scalar quantizer (SQ8_direct_signed) which accepts float vectors in
276
-
8-bit signed integer range and encodes them as byte sized vectors. These quantized byte sized vectors are stored in a k-NN index which reduces the memory footprint by a factor of 4.
277
-
When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQ8_direct_signed quantization can also significantly reduce search latencies and improve indexing throughput.
278
-
279
-
When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
280
-
{: .important}
281
-
282
-
To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index.
283
-
284
-
### Example: HNSW
285
-
286
-
Here is an example to create a byte vector index with the Faiss engine and HNSW algorithm:
287
-
```json
288
-
PUT test-index
289
-
{
290
-
"settings": {
291
-
"index": {
292
-
"knn": true
293
-
}
294
-
},
295
-
"mappings": {
296
-
"properties": {
297
-
"my_vector": {
298
-
"type": "knn_vector",
299
-
"dimension": 2,
300
-
"data_type": "byte",
301
-
"method": {
302
-
"name": "hnsw",
303
-
"space_type": "l2",
304
-
"engine": "faiss",
305
-
"parameters": {
306
-
"ef_construction": 128,
307
-
"m": 24
308
-
}
309
-
}
310
-
}
311
-
}
312
-
}
313
-
}
314
-
315
-
```
316
-
{% include copy-curl.html %}
317
-
318
-
Then ingest documents as usual. But, make sure each dimension in the vector is in the supported [-128, 127] range:
319
-
```json
320
-
PUT test-index/_doc/1
321
-
{
322
-
"my_vector": [-126, 28]
323
-
}
324
-
```
325
-
{% include copy-curl.html %}
326
-
327
-
```json
328
-
PUT test-index/_doc/2
329
-
{
330
-
"my_vector": [100, -128]
331
-
}
332
-
```
333
-
{% include copy-curl.html %}
334
-
335
-
When querying, be sure to use a byte vector:
336
-
```json
337
-
GET test-index/_search
338
-
{
339
-
"size": 2,
340
-
"query": {
341
-
"knn": {
342
-
"my_vector": {
343
-
"vector": [26, -120],
344
-
"k": 2
345
-
}
346
-
}
347
-
}
348
-
}
349
-
```
350
-
{% include copy-curl.html %}
351
-
352
175
### Example: IVF
353
176
354
177
The IVF method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
@@ -499,6 +322,108 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an
499
322
1.1* ((256*1,000,000) + (4*128*256)) ~=0.27GB
500
323
```
501
324
325
+
326
+
### Quantization techniques
327
+
328
+
If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
329
+
330
+
#### Scalar quantization for the L2 space type
331
+
332
+
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$).
333
+
334
+
```python
335
+
# Random dataset (Example to create a random dataset)
336
+
dataset = np.random.uniform(-300, 300, (100, 10))
337
+
# Random query set (Example to create a random queryset)
#### Scalar quantization for the cosine similarity space type
366
+
367
+
The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$).
368
+
369
+
The following pseudocode is for positive numbers:
370
+
371
+
```python
372
+
# For Positive Numbers
373
+
374
+
# INDEXING and QUERYING:
375
+
376
+
# Get Max of train dataset
377
+
max= np.max(dataset)
378
+
min=0
379
+
B =127
380
+
381
+
# Normalize into [0,1]
382
+
val = (val -min) / (max-min)
383
+
val = (val * B)
384
+
385
+
# Get int and fraction values
386
+
int_part = floor(val)
387
+
frac_part = val - int_part
388
+
389
+
if0.5< frac_part:
390
+
bval = int_part +1
391
+
else:
392
+
bval = int_part
393
+
394
+
return Byte(bval)
395
+
```
396
+
{% include copy.html %}
397
+
398
+
The following pseudocode is for negative numbers:
399
+
400
+
```python
401
+
# For Negative Numbers
402
+
403
+
# INDEXING and QUERYING:
404
+
405
+
# Get Min of train dataset
406
+
min=0
407
+
max=-np.min(dataset)
408
+
B =128
409
+
410
+
# Normalize into [0,1]
411
+
val = (val -min) / (max-min)
412
+
val = (val * B)
413
+
414
+
# Get int and fraction values
415
+
int_part = floor(var)
416
+
frac_part = val - int_part
417
+
418
+
if0.5< frac_part:
419
+
bval = int_part +1
420
+
else:
421
+
bval = int_part
422
+
423
+
return Byte(bval)
424
+
```
425
+
{% include copy.html %}
426
+
502
427
## Binary k-NN vectors
503
428
504
429
You can reduce memory costs by a factor of 32 by switching from float to binary vectors.
Copy file name to clipboardExpand all lines: _ml-commons-plugin/tutorials/semantic-search-byte-vectors.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ nav_order: 10
7
7
8
8
# Semantic search using byte-quantized vectors
9
9
10
-
This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector).
10
+
This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vector).
11
11
12
12
The Cohere Embed v3 model supports several `embedding_types`. For this tutorial, you'll use the `INT8` type to encode byte-quantized vectors.
Copy file name to clipboardExpand all lines: _search-plugins/knn/knn-index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,9 +41,9 @@ PUT /test-index
41
41
```
42
42
{% include copy-curl.html %}
43
43
44
-
## Lucene byte vector
44
+
## Byte vector
45
45
46
-
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
46
+
Starting with k-NN plugin version 2.17, you can use `byte` vectors with the `faiss` and `lucene` engine to reduce the amount of memory and storage space needed. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#Byte-vector).
47
47
48
48
## Binary vector
49
49
@@ -324,7 +324,7 @@ If you want to use less memory and increase indexing speed as compared to HNSW w
324
324
325
325
If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop.
326
326
327
-
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
327
+
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
Copy file name to clipboardExpand all lines: _search-plugins/knn/knn-vector-quantization.md
+3-7Lines changed: 3 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,17 +13,13 @@ By default, the k-NN plugin supports the indexing and querying of vectors of typ
13
13
14
14
OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ).
15
15
16
-
## Faiss byte vector
16
+
## Byte vector
17
17
18
-
Starting with version 2.17, the k-NN plugin supports `byte` vectors with the Faiss engine in order to reduce the amount of required memory. For more information, see [Faiss byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#faiss-byte-vector).
19
-
20
-
## Lucene byte vector
21
-
22
-
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucene engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
18
+
Starting with version 2.17, the k-NN plugin supports `byte` vectors with the `faiss` and `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector).
23
19
24
20
## Lucene scalar quantization
25
21
26
-
Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors.
22
+
Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors.
27
23
28
24
Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.
Copy file name to clipboardExpand all lines: _search-plugins/vector-search.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ PUT test-index
57
57
58
58
You must designate the field that will store vectors as a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field type. OpenSearch supports vectors of up to 16,000 dimensions, each of which is represented as a 32-bit or 16-bit float.
59
59
60
-
To save storage space, you can use `byte` or `binary` vectors. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
60
+
To save storage space, you can use `byte` or `binary` vectors. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
0 commit comments