You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add option to normalize vector distances on query (#298)
This pr accomplishes 2 goals:
1. Add an option for users to easily get back a similarity value between
0 and 1 that they might expect to compare against other vector dbs.
2. Fix the current bug that `distance_threshold` is validated to be
between 0 and 1 when in reality it can take values between 0 and 2.
> Note: after much careful thought I believe it is best that for `0.5.0`
we do **not** start enforcing all distance_thresholds between 0 and 1
and move to this option as default behavior. Ideally this metric would
be consistent throughout our code and I don't love supporting this flag
but I think it provides the value that is scoped for this ticket while
inflicting the least amount of pain and confusion.
Changes:
1. Adds the `normalize_vector_distance` flag to VectorQuery and
VectorRangeQuery.
Behavior:
- If set to `True` it normalizes values returned from redis to a value
between 0 and 1.
- For cosine similarity, it applies `(2 - value)/2`.
- For L2 distance, it applies normalization `(1/(1+value))`.
- For IP, it does nothing and throws a warning since normalized IP is
cosine by definition.
- For VectorRangeQuery, if `normalize_vector_distance=True` the distance
threshold is now validated to be between 0 and 1 and denormalized for
execution against the database to make consistent.
2. Relaxes validation for semantic caching and routing to be between 0
and 2 fixing the current bug and aligning with how the database actually
functions.
0 commit comments