Skip to content

Improve performance for approximated match_all sort queries #18206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
prudhvigodithi opened this issue May 5, 2025 · 0 comments · Fixed by #18189 or #18220
Closed

Improve performance for approximated match_all sort queries #18206

prudhvigodithi opened this issue May 5, 2025 · 0 comments · Fixed by #18189 or #18220
Assignees
Labels

Comments

@prudhvigodithi
Copy link
Member

prudhvigodithi commented May 5, 2025

Describe the bug

  • Update the shortcutTotalHitCount logic to identify the query as MatchAllDocsQuery.class.

  • Today with approximation the match_all is converted to a range query. With this the totalHitsThreshold coming from Lucene TopFieldCollector is changed to 10k.

  • For match_all the threshold should be 10 (the numHits value) which is coming from TopDocsCollectorContext part of OpenSearch.

  • With totalHitsThreshold as 10k, with large threshold this is delaying the updateCompetitiveIterator process part of the Lucene NumericComparator and forcing to compare all the 10k docs.

  • With the default 10, the competitive iterator would have updated early and could eliminate some docs from 10k.

  • This fixed the inconsistency because now the total hit count correctly includes all documents that would match a true match_all query, even when the query has been optimized into a range query on the sort field.

  • Should fix the [AUTOCUT] Gradle Check Flaky Test Report for SimpleSearchIT #16851

Related component

Search:Performance

To Reproduce

N/A

Expected behavior

This should improve the performance for match_all queries that go with approximation as the Lucene competitive iterator would trigger early. Benchmark results #18189 (comment).

This change should also bring the behavior in line with what users expect when running a match_all query with sorting to include the documents that was missing the sort field.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@prudhvigodithi prudhvigodithi added bug Something isn't working untriaged labels May 5, 2025
@prudhvigodithi prudhvigodithi changed the title [BUG] Retain the default totalHitsThreshold for approximated match_all queries Retain the default totalHitsThreshold for approximated match_all queries May 5, 2025
@prudhvigodithi prudhvigodithi added feature New feature or request and removed bug Something isn't working untriaged labels May 5, 2025
@prudhvigodithi prudhvigodithi self-assigned this May 5, 2025
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board May 6, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in Performance Roadmap May 6, 2025
@prudhvigodithi prudhvigodithi changed the title Retain the default totalHitsThreshold for approximated match_all queries Improve performance for approximated match_all sort queries May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment