Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add upper_bound in min-max normalization #1210

Open
martin-gaievski opened this issue Mar 5, 2025 · 0 comments
Open

[FEATURE] Add upper_bound in min-max normalization #1210

martin-gaievski opened this issue Mar 5, 2025 · 0 comments

Comments

@martin-gaievski
Copy link
Member

martin-gaievski commented Mar 5, 2025

Currently, the min-max normalization technique uses the actual retrieved scores to find the minimum and maximum scores and then scales all scores using the interval [min score, ..., max score]. Team is adding a lower_bound feature that allows users to provide a minimum score value. For some scenarios, a similar feature is required for the maximum score.

In following scenario we are using knn/neural query, and we know that scores are always in a interval [0.75...1.0]. For some query resulting scores are [0.77, 0.77, 0.76, 0.75, 0.75], with min-max this will be normalized to [1.0, 1.0, 0.5, 0.0, 0.0], indicating that document with score 0.77 has score of 1.0 and is very relevant, which is not entirely correct.
If we use an upper bound score of 1.0, then the scores in this example will be normalized to [0.08, 0.08, 0.04, 0.0, 0.0], where the same document has a score of 0.08, indicating low relevancy.

Proposed Solution:
The upper bound score should be configurable at the sub-query level, with the ability to skip it for some sub-queries. Configuration can be similar to the lower_bound score feature:

Modes:

  • ignore: upper bound is ignored
  • clip: score is clipped to the upper bound if it exceeds it
  • apply: use actual score value if it exceeds upper bound score
{
  "description": "Normalization processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max",
          "parameters": {
            "upper_bounds": [
              { 
                "mode": "apply",
                "max_score": 1.0
              }, 
                "mode": "clip",
                "min_score": 5.0
              }, 
                "mode": "ignore"
              }
            ]
          }
        },
        "combination": {
          "technique": "arithmetic_mean"
          }
        }
      }
    }
  ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant