-
Notifications
You must be signed in to change notification settings - Fork 24
Optimize query storage by storing source field as a string in local index #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize query storage by storing source field as a string in local index #424
Conversation
Does this actually change the |
241dd3e
to
85c9170
Compare
00fa3ca
to
3a576e5
Compare
c736543
to
d8d9eff
Compare
db58a0e
to
b1a7366
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #424 +/- ##
============================================
- Coverage 78.37% 77.05% -1.33%
- Complexity 613 622 +9
============================================
Files 52 52
Lines 2340 2432 +92
Branches 225 252 +27
============================================
+ Hits 1834 1874 +40
- Misses 383 423 +40
- Partials 123 135 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b1a7366
to
e70c4cc
Compare
e70c4cc
to
c8d8276
Compare
src/main/java/org/opensearch/plugin/insights/core/exporter/LocalIndexExporter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/plugin/insights/core/exporter/LocalIndexExporter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/plugin/insights/core/listener/QueryInsightsListener.java
Outdated
Show resolved
Hide resolved
We should do more tests especially for compatibilities in this PR, e.g.
|
4b92419
to
0464372
Compare
…ndex Signed-off-by: Emily Guo <[email protected]>
0464372
to
17e0ada
Compare
Description
This PR addresses some bugs that we're having with how the source field is stored in the local index. The source field can be really long and/or really nested. The solution is to store the source field as a string within the local index and in-memory so no exceptions related to field depth or number of fields is triggered and to maintain consistency.
For the top_queries index template that contains the local index mapping, the template is updated daily during index creation. When changing the index mapping, upgrades should not be affected because reindex automatically upgrades data format from object to string.
Additionally, for strings that are too long, they are truncated and a boolean to mark that the string has been truncated is set. These source strings are then skipped when parsing.
Issues Resolved
Closes [/issues/352] and [/issues/411]
Dashboards PR to handle source field change: opensearch-project/query-insights-dashboards#357
Check that Local index properly created:
Scrolling down we can see that source is in properties in the local index.
Test Results (1000 query records)
I performed a test with 1000 query records to compare the storage usage of storing source as a string compared to storing source as a SearchSourceBuilder and got the following results:
We can see that there is a significant reduction in memory storage for source when it's stored as string, especially in larger, more complex queries.
However, it's important to note that source was stored as a SearchSourceBuilder for a reason, and after it's stored into local index, it's needed as a SearchSourceBuilder only in SearchQueryCategorizer, which only affects deployments with search query metrics enabled. This extra cost would be 4-15 ms per query parsing overhead.
The memory saving benefits and stability (avoiding failures from attempts to store source as a SearchSourceBuilder) take precedence over the extra time it takes to parse queries in certain conditions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.