[RFC] Native Maximal Marginal Relevance (MMR) Support

## Background

Maximal Marginal Relevance (MMR) is a greedy re-ranking algorithm designed to select documents that are both highly relevant to a query and diverse with respect to each other. It works by iteratively choosing the next document that maximizes: 

MMR ( 𝐷 𝑖 ) = 𝜆 ⋅ Similarity ( 𝐷 𝑖 , 𝑄 ) − ( 1 − 𝜆 ) ⋅ max ⁡ 𝐷 𝑗 ∈ 𝑆 Similarity ( 𝐷 𝑖 , 𝐷 𝑗 ) 

Where: 
𝜆 (diversity parameter) controls the trade-off between relevance and novelty. 
Similarity ( 𝐷 𝑖 , 𝑄 ) measures the relevance of document 𝐷 𝑖  to the query. 
Similarity ( 𝐷 𝑖 , 𝐷 𝑗 )  measures similarity between candidate documents to avoid redundancy. 
𝑆 is the set of documents already selected. 

In vector search contexts such as knn or neural queries, MMR can improve retrieval results by preventing the top‑k from being dominated by near-duplicate embeddings. Without MMR, results tend to cluster closely in the embedding space, reducing information diversity.

Currently, MMR can only be implemented externally or via custom pipelines, which adds complexity, increases latency, and requires manual request/response manipulation.

The goal is to provide native MMR support in OpenSearch for single knn and neural queries using knn_vector.

Related Github requests:
https://github.com/opensearch-project/k-NN/issues/2804
https://github.com/opensearch-project/neural-search/issues/1481

## Requirement

### In Scope

1. Handle a single knn query.
2. Handle a single neural query using knn_vector.

### Out of Scope

1. Embeddings in nested objects (unclear similarity calculation for multiple embeddings per doc).
2. MMR for queries nested inside hybrid queries. (not sure if we want to simply rerank the response)
3. MMR for queries nested inside bool queries. (not sure if we want to simply rerank the response)

## Solutions

<img width="1471" height="236" alt="Image" src="https://github.com/user-attachments/assets/5d24f975-eaf3-4c3e-aba6-1074cb515afa" />

**Flow:**

1. Search Request Processor: modifies the original query k and size to oversample candidates. Also collect the info needed by the response processor and set it in the PipelineContext so that later the response processor can use it.
2. Search Response Processor: reranks the candidates using MMR and selects the top results based on the original query size.

Note: We will rely on the [System Generated Search Pipeline](https://github.com/opensearch-project/OpenSearch/issues/19062) to support this feature so that we don't need to manual set up the search pipeline to use the search processors.

### How to take mmr parameters?

#### Option 1: New Query Type (mmr)

**Example Request:**

```
{
  "query": {
    "mmr": {
      "query": {
         "knn": {
           "vector_field": [...],
           "k": 10
         }
      },
      "candidates": 100,
      "diversity": 0.5 // 0 - 1, higher value means higher weight for diversity in rerank
    }
  },
  "_source": {
    "excludes": [
      "vector_field"
    ]
  }
}
```

**Processor Transformation:**

```
{
  "query": {
    "knn": {
        "vector_field": [...],
        "k": 100
    }
  },
  "size": 100
}
```

**Pros:**

* Self-contained — all MMR parameters and the inner query are together.
* Clear discoverability — users see mmr as a distinct query type.

**Cons:**

* Increases API surface — more parsing, serialization, and query builder code.
* Potential user confusion — may be mistaken as a normal query rather than a transformation wrapper.

**Implementation Note:**

* Must be top-level only. doRewrite and doToQuery functions should never be invoked.

#### Option 2: Query Extension (ext.mmr)  (recommended)

**Example Request:**

```
{
  "query": {
    "knn": {
        "vector_field": [...],
        "k":10
    }
  },
  "ext": {
    "mmr": {
      "candidates": 100,
      "diversity": 0.5
    }
  }
}
```

**Processor Transformation:**

```
{
  "query": {
    "knn": {
        "vector_field": [...],
        "k":100
    }
  },
  "size": 100,
  "ext": {
    "mmr": {
      "candidates": 100,
      "diversity": 0.5
    }
  }
}
```

**Pros:**

* Minimal API change — fits naturally into existing ext mechanism.
* Easier for users to toggle MMR on/off without changing the query structure.

**Cons:**

* Less discoverable — hidden under ext.

**Recommendation on Query Type vs Extension**

* If API clarity and discoverability are priorities → Option 1.
* If minimal disruption and composability are priorities → Option 2.

The implementation effort is similar and theoritically this is more like an extension to enhance the query function so we would recommend the option 2.

### Where to introduce the change?

#### Where to Introduce the New Query/Extension

* Add to the KNN plugin, since both Neural Search and KNN plugins rely on vectors.
* Do not add to core — MMR is vector-specific and not generic enough.

#### Where to Introduce the processors


<img width="721" height="411" alt="Image" src="https://github.com/user-attachments/assets/20d9e4e4-21a4-41ef-8d3f-c3dbeff60c35" />

**Search Request Processor**

The search request processor needs access to the original query to modify the k parameter:

* KNN query: belongs to the KNN plugin.
* Neural query: belongs to the Neural plugin, which depends on the KNN plugin.

Because the Neural plugin depends on the KNN plugin:

* The Neural plugin can access the KNN query, but the KNN plugin cannot access the Neural query.
* We also want MMR to work when only the KNN plugin is installed.

**Proposed solution:**

* Implement the search request processor in the KNN plugin.
* Introduce a transformer registry that allows other plugins (e.g., Neural plugin) to register a transformer for their query type.

```
public interface MMRQueryTransformer<T extends QueryBuilder> {
    /**
     * Transform the queryBuilder to oversample for MMR.
     * Also need to figure out the vector field path and the space type and set them in the MMRProcessingContext for
     * response processor to consume.
     * @param queryBuilder
     * @param listener
     * @param mmrTransformContext
     */
    void transform(T queryBuilder, ActionListener<Void> listener, MMRTransformContext mmrTransformContext);
}
```

This allows:

* KNN queries to be handled natively in the KNN plugin.
* Neural queries to have dedicated logic in the Neural plugin.

**Search Response Processor**

The search response processor only needs to perform MMR reranking, which depends on:

* The vector in each hit.
* The vector field path, space type, and data type to compute similarity.

It does not depend on the original query type.

* Collecting the vector field info and space type for different query types requires dedicated logic, but this can be done during the search request processing using the transformers.

**Advantages:**

* Only one search response processor is needed in the KNN plugin.
* Transformers populate the MMRRerankContext with all necessary info for reranking.
* Neural plugin can inject custom logic without duplicating the response processing logic.

#### Processor Implementation Detail

**Search Request Processor**

* Modify the query size
    * Should set the query size as the candidates to oversample.
* Modify k
    * Should set the k as the candidates to oversample.
    * If we are using min_score or max_distance then we don’t need to change it.
* Ensure embedding in the source
    * Should fetch all the fields from the source to ensure we have the vector for MMR. It’s tricky to modify the _source to ensure it doesn’t exclude the embedding so we simply include all fields and do filter after the MMR rerank.
* Collect info
    * We should collect necessary info needed by the search response processor and set it in the pipeline context. We need to collect the info from the index mapping or the model metadata.
```    
{
          "mmr_rerank_context": {
            "original_query_size": 10, // used to decide how many hits should return
            "diversity":0.5,
            "vectorFieldPath": "vector_field", // used to access the vector in the source
            "spaceType": "l2" // used to decide the similarity function
            "vectorDataType": "float", // used to vector data type when parse it from the source
            "originalFetchSourceContext": {"excludes": ["vector_field"]} // used to filter the source after the rerank
           }
        }
```
* Validation:
    * Ensure the vector field in all the target indices should have the same space type and data type so that we can use the same similarity function.
* Order
    * This search request processor modifies the query size and k if we are using it. Which just needs to be done before the query phase. Considering it is possible there can be other processors modify the search request to inject knn query so would recommend to execute this one after the user defined query.

**Search Response Processor**

* Extract vectors from hits
    * Use vector field path to extract the vector from the hits and use the vector data type to decide it should be float or byte. This can impact how we calculate the similarity later.
* Similarity computation
    * Option 1: reuse existing doc score (recommended)
        * Pros:
            * Fast, leverages existing KNN/neural scores.
        * Cons:
            * Slight bias possible if score transformations exist. e.g. boost
    * Option 2: recalculate similarity
        * Pros:
            * Accurate, independent of score transformations.
        * Cons:
            * Slower, requires raw vectors. For neural query a phase result processor will be needed to access the vector in the rewritten query.
    * Recommendation:  Most KNN or neural searches already produce high-quality similarity scores. So would recommend to reuse doc score for now; optional enhancement to recompute similarity if needed in future.
* Similarity Function
    * Determined by vector field’s space type (e.g., L2, cosine). For now we only support the knn_vector field and it should always have a space type value.
    * Ensures MMR reranking uses correct similarity function. This mapping is already defined in the knn pluign.
* Order
    * Since we will do MMR re-rank and reduce the results to the original query size so we would do this before the user defined processors.
* Validation
    * Nested vectors are not supported.

## Extendability

To support the bool and hybrid queries with MMR rerank we can allow users to specify:

* Vector field for MMR.
* Query embedding vector if don’t want to use the doc score directly
* Model ID if generating embedding from plain text.

Then we should be able to do MMR rerank.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Native Maximal Marginal Relevance (MMR) Support #2869

Background

Requirement

In Scope

Out of Scope

Solutions

How to take mmr parameters?

Option 1: New Query Type (mmr)

Option 2: Query Extension (ext.mmr) (recommended)

Where to introduce the change?

Where to Introduce the New Query/Extension

Where to Introduce the processors

Processor Implementation Detail

Extendability

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Native Maximal Marginal Relevance (MMR) Support #2869

Description

Background

Requirement

In Scope

Out of Scope

Solutions

How to take mmr parameters?

Option 1: New Query Type (mmr)

Option 2: Query Extension (ext.mmr) (recommended)

Where to introduce the change?

Where to Introduce the New Query/Extension

Where to Introduce the processors

Processor Implementation Detail

Extendability

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions