Skip to content
172 changes: 165 additions & 7 deletions _field-types/supported-field-types/knn-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,23 +85,30 @@ However, if you intend to use Painless scripting or a k-NN score script, you onl
}
```

## Lucene byte vector
## Byte vectors

By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
By default, k-NN vectors are `float` vectors, in which each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` or `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.

Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines.
Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine.
{: .note}

In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution).

When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
{: .important}


When using `byte` vectors with the `faiss` engine, we recommend using [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), which helps to significantly reduce search latencies and improve indexing throughput.
{: .important}

Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.

To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:

```json
### Example: HNSW

The following example creates a byte vector index with the `lucene` engine and `hnsw` algorithm:

```json
PUT test-index
{
"settings": {
Expand Down Expand Up @@ -132,7 +139,7 @@ PUT test-index
```
{% include copy-curl.html %}

Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range:
After creating the index, ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range:

```json
PUT test-index/_doc/1
Expand Down Expand Up @@ -168,6 +175,157 @@ GET test-index/_search
```
{% include copy-curl.html %}

### Example: IVF

The `ivf` method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).

First, create an index that will contain byte vector training data. Specify the `faiss` engine and `ivf` algorithm and make sure that the `dimension` matches the dimension of the model you want to create:

```json
PUT train-index
{
"mappings": {
"properties": {
"train-field": {
"type": "knn_vector",
"dimension": 4,
"data_type": "byte"
}
}
}
}
```
{% include copy-curl.html %}

First, ingest training data containing byte vectors into the training index:

```json
PUT _bulk
{ "index": { "_index": "train-index", "_id": "1" } }
{ "train-field": [127, 100, 0, -120] }
{ "index": { "_index": "train-index", "_id": "2" } }
{ "train-field": [2, -128, -10, 50] }
{ "index": { "_index": "train-index", "_id": "3" } }
{ "train-field": [13, -100, 5, 126] }
{ "index": { "_index": "train-index", "_id": "4" } }
{ "train-field": [5, 100, -6, -125] }
```
{% include copy-curl.html %}

Then, create and train the model named `byte-vector-model`. The model will be trained using the training data from the `train-field` in the `train-index`. Specify the `byte` data type:

```json
POST _plugins/_knn/models/byte-vector-model/_train
{
"training_index": "train-index",
"training_field": "train-field",
"dimension": 4,
"description": "model with byte data",
"data_type": "byte",
"method": {
"name": "ivf",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"nlist": 1,
"nprobes": 1
}
}
}
```
{% include copy-curl.html %}

To check the model training status, call the Get Model API:

```json
GET _plugins/_knn/models/byte-vector-model?filter_path=state
```
{% include copy-curl.html %}

Once the training is complete, the `state` changes to `created`.

Next, create an index that will initialize its native library indexes using the trained model:

```json
PUT test-byte-ivf
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"model_id": "byte-vector-model"
}
}
}
}
```
{% include copy-curl.html %}

Ingest the data containing the byte vectors that you want to search into the created index:

```json
PUT _bulk?refresh=true
{"index": {"_index": "test-byte-ivf", "_id": "1"}}
{"my_vector": [7, 10, 15, -120]}
{"index": {"_index": "test-byte-ivf", "_id": "2"}}
{"my_vector": [10, -100, 120, -108]}
{"index": {"_index": "test-byte-ivf", "_id": "3"}}
{"my_vector": [1, -2, 5, -50]}
{"index": {"_index": "test-byte-ivf", "_id": "4"}}
{"my_vector": [9, -7, 45, -78]}
{"index": {"_index": "test-byte-ivf", "_id": "5"}}
{"my_vector": [80, -70, 127, -128]}
```
{% include copy-curl.html %}

Finally, search the data. Be sure to provide a byte vector in the k-NN vector field:

```json
GET test-byte-ivf/_search
{
"size": 2,
"query": {
"knn": {
"my_vector": {
"vector": [100, -120, 50, -45],
"k": 2
}
}
}
}
```
{% include copy-curl.html %}

### Memory estimation

In the best-case scenario, byte vectors require 25% of the memory required by 32-bit vectors.

#### HNSW memory estimation

The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph.

As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The memory requirement can be estimated as follows:

```r
1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.39 GB
```

#### IVF memory estimation

The memory required for IVF is estimated to be `1.1 * ((dimension * num_vectors) + (4 * nlist * dimension))` bytes/vector, where `nlist` is the number of buckets to partition vectors into.

As an example, assume that you have 1 million vectors with a dimension of 256 and an `nlist` of 128. The memory requirement can be estimated as follows:

```r
1.1 * ((256 * 1,000,000) + (4 * 128 * 256)) ~= 0.27 GB
```


### Quantization techniques

If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
Expand Down Expand Up @@ -269,7 +427,7 @@ return Byte(bval)
```
{% include copy.html %}

## Binary k-NN vectors
## Binary vectors

You can reduce memory costs by a factor of 32 by switching from float to binary vectors.
Using binary vector indexes can lower operational costs while maintaining high recall performance, making large-scale deployment more economical and efficient.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 10

# Semantic search using byte-quantized vectors

This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector).
This tutorial shows you how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vectors).

The Cohere Embed v3 model supports several `embedding_types`. For this tutorial, you'll use the `INT8` type to encode byte-quantized vectors.

Expand Down
4 changes: 2 additions & 2 deletions _search-plugins/knn/approximate-knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ To learn more about the radial search feature, see [k-NN radial search]({{site.u

### Using approximate k-NN with binary vectors

To learn more about using binary vectors with k-NN search, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
To learn more about using binary vectors with k-NN search, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors).

## Spaces

Expand All @@ -346,5 +346,5 @@ The cosine similarity formula does not include the `1 -` prefix. However, becaus
With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown.
{: .note }

The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors).
{: .note}
12 changes: 6 additions & 6 deletions _search-plugins/knn/knn-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,13 @@ PUT /test-index
```
{% include copy-curl.html %}

## Lucene byte vector
## Byte vectors

Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
Starting with k-NN plugin version 2.17, you can use `byte` vectors with the `faiss` and `lucene` engines to reduce the amount of required memory and storage space. For more information, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vectors).

## Binary vector
## Binary vectors

Starting with k-NN plugin version 2.16, you can use `binary` vectors with the `faiss` engine to reduce the amount of required storage space. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
Starting with k-NN plugin version 2.16, you can use `binary` vectors with the `faiss` engine to reduce the amount of required storage space. For more information, see [Binary vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors).

## SIMD optimization for the Faiss engine

Expand Down Expand Up @@ -116,7 +116,7 @@ Method name | Requires training | Supported spaces | Description
For hnsw, "innerproduct" is not available when PQ is used.
{: .note}

The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors).
{: .note}

#### HNSW parameters
Expand Down Expand Up @@ -324,7 +324,7 @@ If you want to use less memory and increase indexing speed as compared to HNSW w

If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop.

You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).
You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vectors) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/).

### Memory estimation

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/knn/knn-score-script.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,5 +302,5 @@ Cosine similarity returns a number between -1 and 1, and because OpenSearch rele
With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ... ]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown.
{: .note }

The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors).
{: .note}
Loading
Loading