-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Use new source loader when lower docId is accessed #128320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new source loader when lower docId is accessed #128320
Conversation
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @parkertimmins, I've created a changelog YAML for you. |
@jimczi Added you as reviewer on this, since if modifies code that you recently added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments, LGTM otherwise.
I think we should also backport to the 8.18 branch?
String indexName = "test-foo"; | ||
createIndex(indexName, Settings.builder().put("index.mode", "logsdb").build(), mappings); | ||
|
||
int numDocs = 100_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of data that this test generates / indexes, makes this is an expensive test. Does the problem re-occur if we lower the number of documents here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yep, 100k is a lot. Your original test example actually triggers the issue with 10 docs. Reverting the test to be closer to your original version, and decreasing to 10 docs.
assertThat(shardsHeader.get("failed"), equalTo(0)); | ||
assertThat(shardsHeader.get("successful"), equalTo(1)); | ||
assertThat(shardsHeader.get("skipped"), equalTo(0)); | ||
logger.info("searchResponse: {}", searchResponseBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also assert total hits here?
@@ -44,6 +46,11 @@ public Source getSource(LeafReaderContext ctx, int doc) throws IOException { | |||
leaf = new Leaf(sourceLoader.leaf(ctx.reader(), null), storedFieldLoader.getLoader(ctx, null)); | |||
var existing = leaves.put(id, leaf); | |||
assert existing == null : "unexpected source provider [" + existing + "]"; | |||
} else if (isStoredSource == false && doc < leaf.doc) { | |||
// For synthetic source, if a runtime field is used more than once, a new source loader must be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also incorporate the following into this comment: with both synthetic and stored source the docid can go backwards. This isn't a problem for stored fields (somehow it validate when docid goes backwards and doesn't run into EOF like errors), but it is for doc values (which gets used with synthetic source).
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here seems to be when multiple clauses in a query reference runtime fields on the same request, no? So it's not only about using the same runtime field twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here seems to be when multiple clauses in a query reference runtime fields on the same request
Yes I think that is a more accurate way to describe the issue. I confirmed that, unsurprisingly, aggregating twice on the same runtime field does indeed work fine as is. So this is specific to there being 2+ query clauses referencing the same runtime field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@@ -44,6 +46,11 @@ public Source getSource(LeafReaderContext ctx, int doc) throws IOException { | |||
leaf = new Leaf(sourceLoader.leaf(ctx.reader(), null), storedFieldLoader.getLoader(ctx, null)); | |||
var existing = leaves.put(id, leaf); | |||
assert existing == null : "unexpected source provider [" + existing + "]"; | |||
} else if (isStoredSource == false && doc < leaf.doc) { | |||
// For synthetic source, if a runtime field is used more than once, a new source loader must be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here seems to be when multiple clauses in a query reference runtime fields on the same request, no? So it's not only about using the same runtime field twice.
💔 Backport failed
You can use sqren/backport to manually backport by running |
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId. (cherry picked from commit 51e87cb) # Conflicts: # server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId. (cherry picked from commit 51e87cb) # Conflicts: # server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId. (cherry picked from commit 51e87cb) # Conflicts: # server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId. (cherry picked from commit 51e87cb) # Conflicts: # server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.