Skip to content

Conversation

bzhangam
Copy link
Contributor

@bzhangam bzhangam commented Sep 8, 2025

Description

Support native Maximal Marginal Relevance. It relies on the system generated search pipeline to auto generate:

  • MMR oversample search request processor. It will modify the search request to oversample for MMR and also resolve the space type for response processor to use.
  • MMR rerank search response processor. It will rerank the hits based on Maximal Marginal Relevance.

We also introduce the MMRQueryTransformer to support different transform logic for different query builder. Here we just implement the transformer for the knn query for now. And we will implement the transformer in neural plugin to handle the neural query properly.

Checks will fail until this PR is merged in core.

example query with mmr:

{
	"_source": {
		"excludes": ["passage_embedding"]
	},
	"query": {
		"knn": {
				"passage_embedding": {
					"vector": [1.0, 2.0, 3.0],
					"k": 10
				}
		}
  },
	"ext":{
		"mmr":{
			"diversity": 0.5,
			"candidates": 20
		}
	}
}

Check benchmark

Related Issues

Resolves #2804

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

SpaceType methodSpaceType = getSpaceTypeFromMethodContext(knnMethodContext);
SpaceType topLevelSpaceType = getSpaceTypeFromString(topLevelSpaceTypeString);

// If we failed to find space type from both method context and top level
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cleaned up this logic in this PR but didn't clean up the comments which can cause confusion. So cleaned it up here since this PR also needs to resolve the space type.

@bzhangam bzhangam force-pushed the mmr branch 6 times, most recently from fbe5747 to 4ce8074 Compare September 11, 2025 21:01
@bzhangam
Copy link
Contributor Author

@navneet1v @Vikasht34 Could you help review this PR?

@Vikasht34
Copy link
Collaborator

@navneet1v @Vikasht34 Could you help review this PR?

I will start ...Please Make Sure all Integs tests are passing

@bzhangam
Copy link
Contributor Author

@Vikasht34 I have added some benchmark data.

Vikasht34
Vikasht34 previously approved these changes Sep 24, 2025
Copy link
Collaborator

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good me , though one test is failing !! Please make sure to fix that.

Vikasht34
Vikasht34 previously approved these changes Sep 26, 2025
heemin32
heemin32 previously approved these changes Sep 26, 2025
@bzhangam bzhangam dismissed stale reviews from heemin32 and Vikasht34 via dad21c9 September 26, 2025 15:12
@Vikasht34 Vikasht34 merged commit 1743bcb into opensearch-project:main Sep 26, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]Native Support for Maximal Marginal Relevance (MMR) in Vector Search

3 participants