[FEATURE] Add upper_bound in min-max normalization #1210

martin-gaievski · 2025-03-05T17:43:00Z

Currently, the min-max normalization technique uses the actual retrieved scores to find the minimum and maximum scores and then scales all scores using the interval [min score, ..., max score]. Team is adding a lower_bound feature that allows users to provide a minimum score value. For some scenarios, a similar feature is required for the maximum score.

In following scenario we are using knn/neural query, and we know that scores are always in a interval [0.75...1.0]. For some query resulting scores are [0.77, 0.77, 0.76, 0.75, 0.75], with min-max this will be normalized to [1.0, 1.0, 0.5, 0.0, 0.0], indicating that document with score 0.77 has score of 1.0 and is very relevant, which is not entirely correct.
If we use an upper bound score of 1.0, then the scores in this example will be normalized to [0.08, 0.08, 0.04, 0.0, 0.0], where the same document has a score of 0.08, indicating low relevancy.

Proposed Solution:
The upper bound score should be configurable at the sub-query level, with the ability to skip it for some sub-queries. Configuration can be similar to the lower_bound score feature:

Modes:

ignore: upper bound is ignored
clip: score is clipped to the upper bound if it exceeds it
apply: use actual score value if it exceeds upper bound score

{
  "description": "Normalization processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max",
          "parameters": {
            "upper_bounds": [
              { 
                "mode": "apply",
                "max_score": 1.0
              }, 
                "mode": "clip",
                "min_score": 5.0
              }, 
                "mode": "ignore"
              }
            ]
          }
        },
        "combination": {
          "technique": "arithmetic_mean"
          }
        }
      }
    }
  ]
}

The text was updated successfully, but these errors were encountered:

martin-gaievski added enhancement untriaged labels Mar 5, 2025

martin-gaievski mentioned this issue Mar 5, 2025

[RFC] Lower bound for min-max normalization technique in Hybrid query #1189

Open

martin-gaievski removed the untriaged label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add upper_bound in min-max normalization #1210

[FEATURE] Add upper_bound in min-max normalization #1210

martin-gaievski commented Mar 5, 2025 •

edited

Loading

[FEATURE] Add upper_bound in min-max normalization #1210

[FEATURE] Add upper_bound in min-max normalization #1210

Comments

martin-gaievski commented Mar 5, 2025 • edited Loading

martin-gaievski commented Mar 5, 2025 •

edited

Loading