Address Review Comments

naveentatikonda · naveentatikonda · commit ddbfa42963c0 · 2024-09-12T12:18:52.000-05:00
Signed-off-by: Naveen Tatikonda &lt;navtat@amazon.com&gt;
diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md
@@ -83,9 +83,9 @@ However, if you intend to use Painless scripting or a k-NN score script, you onl
  }
  ```
 
-## Lucene byte vector
+## Byte vector
 
-By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. 
+By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` and `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. 
  
 Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine.
 {: .note}
@@ -94,11 +94,17 @@ In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/mai
 
 When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
 {: .important}
- 
+
+When using `byte` vectors with `faiss` engine, it is recommended to use with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), which helps to significantly reduce search latencies and improve indexing throughput.
+{: .important} 
+
 Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`.
 
 To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index:
 
+### Example: HNSW
+
+Here is an example to create a byte vector index with the Lucene engine and HNSW algorithm:
  ```json
 PUT test-index
 {
@@ -166,189 +172,6 @@ GET test-index/_search
 ```
 {% include copy-curl.html %}
 
-### Quantization techniques
-
-If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
-
-#### Scalar quantization for the L2 space type
-
-The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$).
-
-```python
-# Random dataset (Example to create a random dataset)
-dataset = np.random.uniform(-300, 300, (100, 10))
-# Random query set (Example to create a random queryset)
-queryset = np.random.uniform(-350, 350, (100, 10))
-# Number of values
-B = 256
-
-# INDEXING:
-# Get min and max
-dataset_min = np.min(dataset)
-dataset_max = np.max(dataset)
-# Shift coordinates to be non-negative
-dataset -= dataset_min
-# Normalize into [0, 1]
-dataset *= 1. / (dataset_max - dataset_min)
-# Bucket into 256 values
-dataset = np.floor(dataset * (B - 1)) - int(B / 2)
-
-# QUERYING:
-# Clip (if queryset range is out of datset range)
-queryset = queryset.clip(dataset_min, dataset_max)
-# Shift coordinates to be non-negative
-queryset -= dataset_min
-# Normalize
-queryset *= 1. / (dataset_max - dataset_min)
-# Bucket into 256 values
-queryset = np.floor(queryset * (B - 1)) - int(B / 2)
-```
-{% include copy.html %}
-
-#### Scalar quantization for the cosine similarity space type
-
-The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$). 
-
-The following pseudocode is for positive numbers:
-
-```python
-# For Positive Numbers
-
-# INDEXING and QUERYING:
-
-# Get Max of train dataset
-max = np.max(dataset)
-min = 0
-B = 127
-
-# Normalize into [0,1]
-val = (val - min) / (max - min)
-val = (val * B)
-
-# Get int and fraction values
-int_part = floor(val)
-frac_part = val - int_part
-
-if 0.5 < frac_part:
- bval = int_part + 1
-else:
- bval = int_part
-
-return Byte(bval)
-```
-{% include copy.html %}
-
-The following pseudocode is for negative numbers:
-
-```python
-# For Negative Numbers
-
-# INDEXING and QUERYING:
-
-# Get Min of train dataset
-min = 0
-max = -np.min(dataset)
-B = 128
-
-# Normalize into [0,1]
-val = (val - min) / (max - min)
-val = (val * B)
-
-# Get int and fraction values
-int_part = floor(var)
-frac_part = val - int_part
-
-if 0.5 < frac_part:
- bval = int_part + 1
-else:
- bval = int_part
-
-return Byte(bval)
-```
-{% include copy.html %}
-
-## Faiss byte vector
-
-Faiss engine is recommended for use cases that requires ingestion on a large scale. But, for these large scale workloads using the default `float` vectors requires a lot of memory usage as each dimension is 4 bytes. If you want to reduce this memory and storage requirements,
-you can use `byte` vectors with the `faiss` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range.
-
-Faiss directly doesn't support byte datatype to store byte vectors. To achieve this functionality we are using a scalar quantizer (SQ8_direct_signed) which accepts float vectors in 
-8-bit signed integer range and encodes them as byte sized vectors. These quantized byte sized vectors are stored in a k-NN index which reduces the memory footprint by a factor of 4.
-When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQ8_direct_signed quantization can also significantly reduce search latencies and improve indexing throughput.
-
-When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall.
-{: .important}
-
-To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index.
-
-### Example: HNSW
-
-Here is an example to create a byte vector index with the Faiss engine and HNSW algorithm:
-```json
-PUT test-index
-{
-  "settings": {
-    "index": {
-      "knn": true
-    }
-  },
-  "mappings": {
-    "properties": {
-      "my_vector": {
-        "type": "knn_vector",
-        "dimension": 2,
-        "data_type": "byte",
-        "method": {
-          "name": "hnsw",
-          "space_type": "l2",
-          "engine": "faiss",
-          "parameters": {
-            "ef_construction": 128,
-            "m": 24
-          }
-        }
-      }
-    }
-  }
-}
-
-```
-{% include copy-curl.html %}
-
-Then ingest documents as usual. But, make sure each dimension in the vector is in the supported [-128, 127] range:
-```json
-PUT test-index/_doc/1
-{
-  "my_vector": [-126, 28]
-}
-```
-{% include copy-curl.html %}
-
-```json
-PUT test-index/_doc/2
-{
-  "my_vector": [100, -128]
-}
-```
-{% include copy-curl.html %}
-
-When querying, be sure to use a byte vector:
-```json
-GET test-index/_search
-{
-  "size": 2,
-  "query": {
-    "knn": {
-      "my_vector": {
-        "vector": [26, -120],
-        "k": 2
-      }
-    }
-  }
-}
-```
-{% include copy-curl.html %}
-
 ### Example: IVF
 
 The IVF method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
@@ -499,6 +322,108 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an
 1.1 * ((256 * 1,000,000) + (4 * 128 * 256))  ~= 0.27 GB
 ```
 
+
+### Quantization techniques
+
+If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only.
+
+#### Scalar quantization for the L2 space type
+
+The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$).
+
+```python
+# Random dataset (Example to create a random dataset)
+dataset = np.random.uniform(-300, 300, (100, 10))
+# Random query set (Example to create a random queryset)
+queryset = np.random.uniform(-350, 350, (100, 10))
+# Number of values
+B = 256
+
+# INDEXING:
+# Get min and max
+dataset_min = np.min(dataset)
+dataset_max = np.max(dataset)
+# Shift coordinates to be non-negative
+dataset -= dataset_min
+# Normalize into [0, 1]
+dataset *= 1. / (dataset_max - dataset_min)
+# Bucket into 256 values
+dataset = np.floor(dataset * (B - 1)) - int(B / 2)
+
+# QUERYING:
+# Clip (if queryset range is out of datset range)
+queryset = queryset.clip(dataset_min, dataset_max)
+# Shift coordinates to be non-negative
+queryset -= dataset_min
+# Normalize
+queryset *= 1. / (dataset_max - dataset_min)
+# Bucket into 256 values
+queryset = np.floor(queryset * (B - 1)) - int(B / 2)
+```
+{% include copy.html %}
+
+#### Scalar quantization for the cosine similarity space type
+
+The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$). 
+
+The following pseudocode is for positive numbers:
+
+```python
+# For Positive Numbers
+
+# INDEXING and QUERYING:
+
+# Get Max of train dataset
+max = np.max(dataset)
+min = 0
+B = 127
+
+# Normalize into [0,1]
+val = (val - min) / (max - min)
+val = (val * B)
+
+# Get int and fraction values
+int_part = floor(val)
+frac_part = val - int_part
+
+if 0.5 < frac_part:
+ bval = int_part + 1
+else:
+ bval = int_part
+
+return Byte(bval)
+```
+{% include copy.html %}
+
+The following pseudocode is for negative numbers:
+
+```python
+# For Negative Numbers
+
+# INDEXING and QUERYING:
+
+# Get Min of train dataset
+min = 0
+max = -np.min(dataset)
+B = 128
+
+# Normalize into [0,1]
+val = (val - min) / (max - min)
+val = (val * B)
+
+# Get int and fraction values
+int_part = floor(var)
+frac_part = val - int_part
+
+if 0.5 < frac_part:
+ bval = int_part + 1
+else:
+ bval = int_part
+
+return Byte(bval)
+```
+{% include copy.html %}
+
 ## Binary k-NN vectors
 
 You can reduce memory costs by a factor of 32 by switching from float to binary vectors.
diff --git a/_ml-commons-plugin/tutorials/semantic-search-byte-vectors.md b/_ml-commons-plugin/tutorials/semantic-search-byte-vectors.md
@@ -7,7 +7,7 @@ nav_order: 10
 
 # Semantic search using byte-quantized vectors
 
-This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector).
+This tutorial illustrates how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vector).
 
 The Cohere Embed v3 model supports several `embedding_types`. For this tutorial, you'll use the `INT8` type to encode byte-quantized vectors. 
 
diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md
@@ -41,9 +41,9 @@ PUT /test-index
 ```
 {% include copy-curl.html %}
 
-## Lucene byte vector
+## Byte vector
 
-Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
+Starting with k-NN plugin version 2.17, you can use `byte` vectors with the `faiss` and `lucene` engine to reduce the amount of memory and storage space needed. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#Byte-vector).
 
 ## Binary vector
 
@@ -324,7 +324,7 @@ If you want to use less memory and increase indexing speed as compared to HNSW w
 
 If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop.
 
-You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). 
+You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). 
 
 ### Memory estimation
 
diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md
@@ -13,17 +13,13 @@ By default, the k-NN plugin supports the indexing and querying of vectors of typ
 
 OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ).
 
-## Faiss byte vector
+## Byte vector
 
-Starting with version 2.17, the k-NN plugin supports `byte` vectors with the Faiss engine in order to reduce the amount of required memory. For more information, see [Faiss byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#faiss-byte-vector).
-
-## Lucene byte vector
-
-Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucene engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
+Starting with version 2.17, the k-NN plugin supports `byte` vectors with the `faiss` and `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector).
 
 ## Lucene scalar quantization
 
-Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. 
+Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. 
 
 Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.
 
diff --git a/_search-plugins/vector-search.md b/_search-plugins/vector-search.md
@@ -57,7 +57,7 @@ PUT test-index
 
 You must designate the field that will store vectors as a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field type. OpenSearch supports vectors of up to 16,000 dimensions, each of which is represented as a 32-bit or 16-bit float. 
 
-To save storage space, you can use `byte` or `binary` vectors. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
+To save storage space, you can use `byte` or `binary` vectors. For more information, see [Byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors).
 
 ### k-NN vector search