You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today we provide index configurations comprising of index settings and mappings. There exists a wide range of algorithms each providing a range of capabilities like compression levels, hyper parameters, space type, encoders etc. All of these options allow for us to optimize for better performance, lower cost, etc. However they may not be applicable for every use case and in many instances depend on the how the vector space actually looks like (dimensionality, clustering, dense areas, etc.).
As of now we don’t have a lot of insight into what the actual vector space looks like in the index and so lack the relevant insights to make more accurate and concrete recommendations related to recall issues or optimizations.
Solution
The solution introduces sampling capabilities at both ingestion and query time to provide insights into the vector space. During document indexing, vectors are intercepted at flush time and asynchronously sampled, with statistical computations performed on this subset and stored alongside segment data in the Lucene directory.
Both ingestion and query flows feed into a global level API (like the Stats API) that aggregates sampling data across segments, providing metrics about vector distributions, dimension-level statistics, and search patterns.
A high level sequence diagram of the design is found below:
User will be able to query their profiled statistics like so:
curl -X GET "localhost:9200/_plugins/_knn/sampling/my_index1/stats?pretty"
Thanks @markwu-sde - this looks cool. I like general idea. I think it would also be good to show breakdown based on shard and segment as well in the api
Problem
Today we provide index configurations comprising of index settings and mappings. There exists a wide range of algorithms each providing a range of capabilities like compression levels, hyper parameters, space type, encoders etc. All of these options allow for us to optimize for better performance, lower cost, etc. However they may not be applicable for every use case and in many instances depend on the how the vector space actually looks like (dimensionality, clustering, dense areas, etc.).
As of now we don’t have a lot of insight into what the actual vector space looks like in the index and so lack the relevant insights to make more accurate and concrete recommendations related to recall issues or optimizations.
Solution
The solution introduces sampling capabilities at both ingestion and query time to provide insights into the vector space. During document indexing, vectors are intercepted at flush time and asynchronously sampled, with statistical computations performed on this subset and stored alongside segment data in the Lucene directory.
Both ingestion and query flows feed into a global level API (like the Stats API) that aggregates sampling data across segments, providing metrics about vector distributions, dimension-level statistics, and search patterns.
A high level sequence diagram of the design is found below:
User will be able to query their profiled statistics like so:
Result:
Users should be ideally be able able to trigger the analysis "on-demand" for debugging:
Result:
Related Issues
#2243
The text was updated successfully, but these errors were encountered: