You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the min-max normalization technique uses the actual retrieved scores to find the minimum and maximum scores and then scales all scores using the interval [min score, ..., max score]. Team is adding a lower_bound feature that allows users to provide a minimum score value. For some scenarios, a similar feature is required for the maximum score.
In following scenario we are using knn/neural query, and we know that scores are always in a interval [0.75...1.0]. For some query resulting scores are [0.77, 0.77, 0.76, 0.75, 0.75], with min-max this will be normalized to [1.0, 1.0, 0.5, 0.0, 0.0], indicating that document with score 0.77 has score of 1.0 and is very relevant, which is not entirely correct.
If we use an upper bound score of 1.0, then the scores in this example will be normalized to [0.08, 0.08, 0.04, 0.0, 0.0], where the same document has a score of 0.08, indicating low relevancy.
Proposed Solution:
The upper bound score should be configurable at the sub-query level, with the ability to skip it for some sub-queries. Configuration can be similar to the lower_bound score feature:
Modes:
ignore: upper bound is ignored
clip: score is clipped to the upper bound if it exceeds it
apply: use actual score value if it exceeds upper bound score
Currently, the min-max normalization technique uses the actual retrieved scores to find the minimum and maximum scores and then scales all scores using the interval [min score, ..., max score]. Team is adding a lower_bound feature that allows users to provide a minimum score value. For some scenarios, a similar feature is required for the maximum score.
In following scenario we are using knn/neural query, and we know that scores are always in a interval [0.75...1.0]. For some query resulting scores are [0.77, 0.77, 0.76, 0.75, 0.75], with min-max this will be normalized to [1.0, 1.0, 0.5, 0.0, 0.0], indicating that document with score 0.77 has score of 1.0 and is very relevant, which is not entirely correct.
If we use an upper bound score of 1.0, then the scores in this example will be normalized to [0.08, 0.08, 0.04, 0.0, 0.0], where the same document has a score of 0.08, indicating low relevancy.
Proposed Solution:
The upper bound score should be configurable at the sub-query level, with the ability to skip it for some sub-queries. Configuration can be similar to the lower_bound score feature:
Modes:
The text was updated successfully, but these errors were encountered: