update retrieval quality article #1241

thierrypdamiba · 2024-10-17T23:37:07Z

Make changes to the retrieval quality article

netlify · 2024-10-17T23:37:34Z

✅ Deploy Preview for condescending-goldwasser-91acf0 ready!

Name	Link
🔨 Latest commit	`37e8f14`
🔍 Latest deploy log	https://app.netlify.com/sites/condescending-goldwasser-91acf0/deploys/671fc5c27a73130008d0e984
😎 Deploy Preview	https://deploy-preview-1241--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

joein · 2024-10-18T09:17:55Z

qdrant-landing/content/documentation/tutorials/retrieval-quality.md

-[Loading a dataset from Hugging Face hub](/documentation/tutorials/huggingface-datasets/) tutorial, `Qdrant/arxiv-titles-instructorxl-embeddings`
-from the [Hugging Face hub](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings). Let's download it in a streaming
-mode, as we are only going to use part of it.
+We’ll use a pre-embedded dataset from Hugging Face to train and test Qdrant’s search capabilities. First, load and split the dataset for training (1,000 items) and testing (100 items). 


differs from the code values

joein · 2024-10-18T09:24:50Z

@thierrypdamiba @davidmyriel I actually liked the fact that in the previous version we said that embeddings quality is crucial (maybe we paid it a bit more attention than required) and we explained why we're comparing exact search to ann, now the tutorial has become a bit faceless

thierrypdamiba · 2024-10-21T16:19:29Z

@joein @davidmyriel I added information about the quality and ann vs exact search. Also updated the numbers on the dataset to reflect the code.

generall · 2024-10-22T10:33:15Z

qdrant-landing/package.json

@@ -21,7 +21,8 @@
    "anchor-js": "^5.0.0",
    "bootstrap": "^5.3.3",
    "clipboard": "^2.0.11",
-    "qdrant-page-search": "^1.0.8"
+    "qdrant-page-search": "^1.0.8",
+    "react-router-dom": "^6.27.0"


why do we need this?

We don't need it. Removing now.

Update text and format to better reflect the benefit of ANN vs KNN/exact search and why a user would want to measure retrieval quality" TODO: Add screenshots of how you can do this in the webui

joein · 2024-10-23T18:43:00Z

qdrant-landing/content/documentation/tutorials/retrieval-quality.md

+- **m**: This parameter determines the maximum number of connections per node in the HNSW graph. A higher value for `m` increases the connectivity of the graph, potentially improving search accuracy at the cost of increased memory usage and indexing time. The default value for `m` is 16.
+- **ef_construct**: This parameter controls the size of the dynamic candidate list during index construction. A higher value of `ef_construct` leads to a more exhaustive search during the indexing phase, resulting in a higher quality graph and improved search accuracy. However, this comes at the cost of longer indexing times. The default value for `ef_construct` is 100.
+
+We will use the untuned HNSW as the baseline to compare how changes affect the precision of the search. Initially, we will use the default values of `m` (16) and `ef_construct` (100) for the HNSW algorithm. Later, we will double these values to observe their impact on retrieval quality.


We have already written what the default values are, so we can shorten this sentence, like
"We'll use the default m and ef as a baseline and then tweak the params to see how it affects the precision of the search."

joein · 2024-10-23T18:49:58Z

qdrant-landing/content/documentation/tutorials/retrieval-quality.md

+- If you require higher precision, increase `m` and `ef_construct` while considering the increased memory usage and indexing time.
+- If memory and indexing time are critical constraints, tune the parameters incrementally to find the right balance.


By the way, these is also a third parameter : ef (also known as efSearch, it controls the number of neighbors evaluated during the search, a higher value may increase precision, however, it also increases latency

qdrant-landing/package.json

…ch content

joein · 2024-10-28T10:37:10Z

qdrant-landing/content/documentation/tutorials/retrieval-quality.md

 ```

-Response:
+This step measures the initial retrieval quality before any tuning of the HNSW parameters. The HNSW (Hierarchical Navigable Small World) algorithm has two key parameters that influence search performance and quality:


We could provide a bit more details here:
There are 2 types of parameters which users can tune, index time parameters and search time parameters
index time: m and ef_construct, search time - ef

I think that we might want to mention it here, rather than just add a brief sentence at the end of the article
However, I don't find the code adjustments to be a necessity

…ty.md

update retreival quality article

c8b0b6b

davidmyriel requested a review from joein October 18, 2024 03:27

joein requested changes Oct 18, 2024

View reviewed changes

added info about embedding quality and ann vs exact search

5e17c66

added information for context of the process while keeping it in steps

b399efa

generall reviewed Oct 22, 2024

View reviewed changes

thierrypdamiba and others added 7 commits October 22, 2024 10:40

Update package.json to remove unnecessary react-router-dom package

c80d3dd

Update package.json to fix formatting

be2d7b9

Update retrieval-quality.md content

a853d81

Update text and format to better reflect the benefit of ANN vs KNN/exact search and why a user would want to measure retrieval quality" TODO: Add screenshots of how you can do this in the webui

Update retrieval-quality.md to fix formatting

62314ad

add webui images and explanations

631ac55

fix image links

aa0745c

link to article data for images

e37aa3c

joein reviewed Oct 23, 2024

View reviewed changes

remove double reference to default ef_construct and m and add ef sear…

bce25c2

…ch content

joein reviewed Oct 28, 2024

View reviewed changes

add details about HNSW parameters to search quality tutorial

212116f

thierrypdamiba requested a review from joein October 28, 2024 14:07

Minor formatting updates to parameter descriptions in retrieval-quali…

37e8f14

…ty.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update retrieval quality article #1241

update retrieval quality article #1241

Uh oh!

thierrypdamiba commented Oct 17, 2024

Uh oh!

netlify bot commented Oct 17, 2024 •

edited

Loading

Uh oh!

joein Oct 18, 2024

Uh oh!

joein commented Oct 18, 2024

Uh oh!

thierrypdamiba commented Oct 21, 2024

Uh oh!

generall Oct 22, 2024

Uh oh!

thierrypdamiba Oct 22, 2024

Uh oh!

joein Oct 23, 2024

Uh oh!

joein Oct 23, 2024

Uh oh!

Uh oh!

joein Oct 28, 2024

Uh oh!

Uh oh!

		- If you require higher precision, increase `m` and `ef_construct` while considering the increased memory usage and indexing time.
		- If memory and indexing time are critical constraints, tune the parameters incrementally to find the right balance.

update retrieval quality article #1241

Are you sure you want to change the base?

update retrieval quality article #1241

Uh oh!

Conversation

thierrypdamiba commented Oct 17, 2024

Uh oh!

netlify bot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for condescending-goldwasser-91acf0 ready!

Uh oh!

joein Oct 18, 2024

Choose a reason for hiding this comment

Uh oh!

joein commented Oct 18, 2024

Uh oh!

thierrypdamiba commented Oct 21, 2024

Uh oh!

generall Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

thierrypdamiba Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

joein Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

joein Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joein Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

netlify bot commented Oct 17, 2024 •

edited

Loading