Skip to content

Commit 182cba3

Browse files
authored
[8.x] Backporting adding linear retriever to support weighted sums of sub-retrievers (#121076)
1 parent cc2bf06 commit 182cba3

File tree

30 files changed

+3136
-38
lines changed

30 files changed

+3136
-38
lines changed

docs/changelog/120222.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 120222
2+
summary: Adding linear retriever to support weighted sums of sub-retrievers
3+
area: "Search"
4+
type: enhancement
5+
issues: []

docs/reference/rest-api/common-parms.asciidoc

+43-4
Original file line numberDiff line numberDiff line change
@@ -1328,7 +1328,7 @@ that lower ranked documents have more influence. This value must be greater than
13281328
equal to `1`. Defaults to `60`.
13291329
end::rrf-rank-constant[]
13301330

1331-
tag::rrf-rank-window-size[]
1331+
tag::compound-retriever-rank-window-size[]
13321332
`rank_window_size`::
13331333
(Optional, integer)
13341334
+
@@ -1337,15 +1337,54 @@ query. A higher value will improve result relevance at the cost of performance.
13371337
ranked result set is pruned down to the search request's <<search-size-param, size>>.
13381338
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
13391339
Defaults to the `size` parameter.
1340-
end::rrf-rank-window-size[]
1340+
end::compound-retriever-rank-window-size[]
13411341

1342-
tag::rrf-filter[]
1342+
tag::compound-retriever-filter[]
13431343
`filter`::
13441344
(Optional, <<query-dsl, query object or list of query objects>>)
13451345
+
13461346
Applies the specified <<query-dsl-bool-query, boolean query filter>> to all of the specified sub-retrievers,
13471347
according to each retriever's specifications.
1348-
end::rrf-filter[]
1348+
end::compound-retriever-filter[]
1349+
1350+
tag::linear-retriever-components[]
1351+
`retrievers`::
1352+
(Required, array of objects)
1353+
+
1354+
A list of the sub-retrievers' configuration, that we will take into account and whose result sets
1355+
we will merge through a weighted sum. Each configuration can have a different weight and normalization depending
1356+
on the specified retriever.
1357+
1358+
Each entry specifies the following parameters:
1359+
1360+
* `retriever`::
1361+
(Required, a <<retriever, retriever>> object)
1362+
+
1363+
Specifies the retriever for which we will compute the top documents for. The retriever will produce `rank_window_size`
1364+
results, which will later be merged based on the specified `weight` and `normalizer`.
1365+
1366+
* `weight`::
1367+
(Optional, float)
1368+
+
1369+
The weight that each score of this retriever's top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0.
1370+
1371+
* `normalizer`::
1372+
(Optional, String)
1373+
+
1374+
Specifies how we will normalize the retriever's scores, before applying the specified `weight`.
1375+
Available values are: `minmax`, and `none`. Defaults to `none`.
1376+
1377+
** `none`
1378+
** `minmax` :
1379+
A `MinMaxScoreNormalizer` that normalizes scores based on the following formula
1380+
+
1381+
```
1382+
score = (score - min) / (max - min)
1383+
```
1384+
1385+
See also <<retrievers-examples-linear-retriever, this hybrid search example>> using a linear retriever on how to
1386+
independently configure and apply normalizers to retrievers.
1387+
end::linear-retriever-components[]
13491388

13501389
tag::knn-rescore-vector[]
13511390

docs/reference/search/retriever.asciidoc

+27-2
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ A <<standard-retriever, retriever>> that replaces the functionality of a traditi
2828
`knn`::
2929
A <<knn-retriever, retriever>> that replaces the functionality of a <<search-api-knn, knn search>>.
3030

31+
`linear`::
32+
A <<linear-retriever, retriever>> that linearly combines the scores of other retrievers for the top documents.
33+
3134
`rescorer`::
3235
A <<rescorer-retriever, retriever>> that replaces the functionality of the <<rescore, query rescorer>>.
3336

@@ -45,6 +48,8 @@ A <<rule-retriever, retriever>> that applies contextual <<query-rules>> to pin o
4548

4649
A standard retriever returns top documents from a traditional <<query-dsl, query>>.
4750

51+
[discrete]
52+
[[standard-retriever-parameters]]
4853
===== Parameters:
4954

5055
`query`::
@@ -195,6 +200,8 @@ Documents matching these conditions will have increased relevancy scores.
195200

196201
A kNN retriever returns top documents from a <<knn-search, k-nearest neighbor search (kNN)>>.
197202

203+
[discrete]
204+
[[knn-retriever-parameters]]
198205
===== Parameters
199206

200207
`field`::
@@ -265,21 +272,37 @@ GET /restaurants/_search
265272
This value must be fewer than or equal to `num_candidates`.
266273
<5> The size of the initial candidate set from which the final `k` nearest neighbors are selected.
267274

275+
[[linear-retriever]]
276+
==== Linear Retriever
277+
A retriever that normalizes and linearly combines the scores of other retrievers.
278+
279+
[discrete]
280+
[[linear-retriever-parameters]]
281+
===== Parameters
282+
283+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components]
284+
285+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
286+
287+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
288+
268289
[[rrf-retriever]]
269290
==== RRF Retriever
270291

271292
An <<rrf, RRF>> retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
272293
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
273294

295+
[discrete]
296+
[[rrf-retriever-parameters]]
274297
===== Parameters
275298

276299
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-retrievers]
277300

278301
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-constant]
279302

280-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-window-size]
303+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
281304

282-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-filter]
305+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
283306

284307
[discrete]
285308
[[rrf-retriever-example-hybrid]]
@@ -540,6 +563,8 @@ score = ln(score), if score < 0
540563
----
541564
====
542565

566+
[discrete]
567+
[[text-similarity-reranker-retriever-parameters]]
543568
===== Parameters
544569

545570
`retriever`::

docs/reference/search/rrf.asciidoc

+6-6
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-retrievers]
4545

4646
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-constant]
4747

48-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-window-size]
48+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
4949

5050
An example request using RRF:
5151

@@ -791,11 +791,11 @@ A more specific example of highlighting in RRF can also be found in the <<retrie
791791

792792
==== Inner hits in RRF
793793

794-
The `rrf` retriever supports <<inner-hits,inner hits>> functionality, allowing you to retrieve
795-
related nested or parent/child documents alongside your main search results. Inner hits can be
796-
specified as part of any nested sub-retriever and will be propagated to the top-level parent
797-
retriever. Note that the inner hit computation will take place only at end of `rrf` retriever's
798-
evaluation on the top matching documents, and not as part of the query execution of the nested
794+
The `rrf` retriever supports <<inner-hits,inner hits>> functionality, allowing you to retrieve
795+
related nested or parent/child documents alongside your main search results. Inner hits can be
796+
specified as part of any nested sub-retriever and will be propagated to the top-level parent
797+
retriever. Note that the inner hit computation will take place only at end of `rrf` retriever's
798+
evaluation on the top matching documents, and not as part of the query execution of the nested
799799
sub-retrievers.
800800

801801
[IMPORTANT]

0 commit comments

Comments
 (0)