You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm trying to use quantised embeddings within a RAG pipeline. The base model I am using is"sentence-transformers/all-mpnet-base-v2", and the precision I intend to use is precision="int8".
I can successfully SentenceTransformersDocumentEmbedder to compute document embeddings and store in an ElasticSearch document store. However, when it comes to using ElasticsearchEmbeddingRetriever with the query embedder, i get a division by zero error.
After digging into it in SentenceTransformers, I've found that when using quantized embedding models, a calibration dataset is usually passed in to compute the min/max value range to map floating-point embeddings into an 8-bit integer space.
I then considered using a subset of document embeddings in the document store as a 'calibration' set, but as far as I can tell there is no way of pulling all (or some) documents/embeddings from the store?
Is there a solution to this that I'm missing, appreciate your help!
Error message
/usr/local/lib/python3.10/site-packages/sentence_transformers/quantization.py:434: RuntimeWarning: invalid value encountered in divide
return ((embeddings - starts) / steps - 128).astype(np.int8)
/usr/local/lib/python3.10/site-packages/sentence_transformers/quantization.py:434: RuntimeWarning: invalid value encountered in cast
return ((embeddings - starts) / steps - 128).astype(np.int8)
BadRequestError(400, 'search_phase_execution_exception', 'failed to create query: The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0, 0.0, 0.0, 0.0, 0.0, ...]
Expected behavior
Single-query embeddings to be calculated correctly when using quantization, or a way of creating/passing calibration data
Describe the bug
I'm trying to use quantised embeddings within a RAG pipeline. The base model I am using is
"sentence-transformers/all-mpnet-base-v2"
, and the precision I intend to use isprecision="int8"
.I can successfully
SentenceTransformersDocumentEmbedder
to compute document embeddings and store in an ElasticSearch document store. However, when it comes to usingElasticsearchEmbeddingRetriever
with the query embedder, i get a division by zero error.After digging into it in SentenceTransformers, I've found that when using quantized embedding models, a calibration dataset is usually passed in to compute the min/max value range to map floating-point embeddings into an 8-bit integer space.
I then considered using a subset of document embeddings in the document store as a 'calibration' set, but as far as I can tell there is no way of pulling all (or some) documents/embeddings from the store?
Is there a solution to this that I'm missing, appreciate your help!
Error message
Expected behavior
Single-query embeddings to be calculated correctly when using quantization, or a way of creating/passing calibration data
Additional context
To Reproduce
FAQ Check
System:
The text was updated successfully, but these errors were encountered: