Skip to content

[FEATURE][RFC] Introduce MultiVector Field Type For Late-interaction Score #2706

@luyuncheng

Description

@luyuncheng

Statement

In ColPali paper we found multi vector would optimize large document search ndcg result with Late-interaction like paper shows.

Image

we have been tried using nested document implement multiVector, and do multi KNN query responses calculating ColPali score. but it is not ok, because we found that when KNN queries results with different TopK,

[Q1->Res Document:1,2,3]
[Q2->Res Document:4,5,6]
....

Score function is max(Q1, D)+max(Q2,D) + ... which should include Document [1,6] socres

we can not get the complete score for a paper(parent document):

📌 so i want to introduce Late-interaction capability with new field: multiVector

Proposal

1st introduce new field type: multi_knn_vector

"mappings": {
  "properties": {
    "my_multi_knn_vector_field": {
      "type": "multi_knn_vector"
      "dimension": 3
    }
  }
}
 
PUT xx_index/_doc/1
{ "my_multi_knn_vector_field": [ [1,2,3], [4,5,6], ...]}

for multi_knn_vector we only use binary docvalues, and can combined with derived source.

2nd introduce multi knn query

we can reuse dot product calculation and implement: maximum dot product score like paper says

POST xx_index/_search
{
   "query": {
         "multi_knn_query" : {
             "my_multi_knn_vector_field": {
                 "score_mode": "max_sim_dot",
                 "vectors": [ [1,2,3], [5,6,7] ...]
             }
         }
   }
}

3rd introduce rescore for query which is Late-interaction Score

POST xx_index/_search
{
   "query": {
       "match_all": { }
   },
   "rescore" : {
      "window_size" : 50,
      "query" : {
         "multi_knn_query" : {
             "my_multi_knn_vector_field": {
                "score_mode": "max_sim_dot",
                "vectors": [ [1,2,3], [5,6,7] ...]
             }
         }
      }
   }
}

Future Plan

we can use binary quantize for the multi vector optimize performance

WIP PR: #2707

Metadata

Metadata

Assignees

Labels

FeaturesIntroduces a new unit of functionality that satisfies a requirementRFCRequest for commentsenhancementv3.3.0

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions