Skip to content

Use new source loader when lower docId is accessed #128320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

parkertimmins
Copy link
Contributor

When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.

When using synthetic source, runtime fields data may come from
doc values. Doc values iterators can only be read once,
and in increasing docId order. But if a runtime field is referenced
multiple times in a query, currently the same doc value iterator
will be used. This causes an error, as the second field reference
will attempt to read the same iterator from a lower docId than was
previously used. The fix is to create a new source loader, and
thus a new doc value iterator, if the requested docId is lower than
the last seen docId.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @parkertimmins, I've created a changelog YAML for you.

@parkertimmins parkertimmins requested review from martijnvg and jimczi and removed request for martijnvg May 22, 2025 15:27
@parkertimmins
Copy link
Contributor Author

@jimczi Added you as reviewer on this, since if modifies code that you recently added

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments, LGTM otherwise.

I think we should also backport to the 8.18 branch?

String indexName = "test-foo";
createIndex(indexName, Settings.builder().put("index.mode", "logsdb").build(), mappings);

int numDocs = 100_000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of data that this test generates / indexes, makes this is an expensive test. Does the problem re-occur if we lower the number of documents here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yep, 100k is a lot. Your original test example actually triggers the issue with 10 docs. Reverting the test to be closer to your original version, and decreasing to 10 docs.

assertThat(shardsHeader.get("failed"), equalTo(0));
assertThat(shardsHeader.get("successful"), equalTo(1));
assertThat(shardsHeader.get("skipped"), equalTo(0));
logger.info("searchResponse: {}", searchResponseBody);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also assert total hits here?

@@ -44,6 +46,11 @@ public Source getSource(LeafReaderContext ctx, int doc) throws IOException {
leaf = new Leaf(sourceLoader.leaf(ctx.reader(), null), storedFieldLoader.getLoader(ctx, null));
var existing = leaves.put(id, leaf);
assert existing == null : "unexpected source provider [" + existing + "]";
} else if (isStoredSource == false && doc < leaf.doc) {
// For synthetic source, if a runtime field is used more than once, a new source loader must be used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also incorporate the following into this comment: with both synthetic and stored source the docid can go backwards. This isn't a problem for stored fields (somehow it validate when docid goes backwards and doesn't run into EOF like errors), but it is for doc values (which gets used with synthetic source). ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here seems to be when multiple clauses in a query reference runtime fields on the same request, no? So it's not only about using the same runtime field twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here seems to be when multiple clauses in a query reference runtime fields on the same request

Yes I think that is a more accurate way to describe the issue. I confirmed that, unsurprisingly, aggregating twice on the same runtime field does indeed work fine as is. So this is specific to there being 2+ query clauses referencing the same runtime field.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -44,6 +46,11 @@ public Source getSource(LeafReaderContext ctx, int doc) throws IOException {
leaf = new Leaf(sourceLoader.leaf(ctx.reader(), null), storedFieldLoader.getLoader(ctx, null));
var existing = leaves.put(id, leaf);
assert existing == null : "unexpected source provider [" + existing + "]";
} else if (isStoredSource == false && doc < leaf.doc) {
// For synthetic source, if a runtime field is used more than once, a new source loader must be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here seems to be when multiple clauses in a query reference runtime fields on the same request, no? So it's not only about using the same runtime field twice.

@parkertimmins parkertimmins merged commit 51e87cb into elastic:main May 23, 2025
18 checks passed
@parkertimmins parkertimmins deleted the parker/synthetic-source-runtime-doc-value-reset branch May 23, 2025 20:57
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19
9.0 Commit could not be cherrypicked due to conflicts
8.17 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128320

parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request May 23, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request May 23, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.

(cherry picked from commit 51e87cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request May 23, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.

(cherry picked from commit 51e87cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
@parkertimmins
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.0
8.18
8.17

Questions ?

Please refer to the Backport tool documentation

parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request May 23, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
elasticsearchmachine pushed a commit that referenced this pull request May 23, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
parkertimmins added a commit that referenced this pull request May 27, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.

(cherry picked from commit 51e87cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
parkertimmins added a commit that referenced this pull request May 27, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.

(cherry picked from commit 51e87cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/lookup/ConcurrentSegmentSourceProvider.java
parkertimmins added a commit that referenced this pull request May 27, 2025
When using synthetic source, runtime fields data may come from doc values. Doc values iterators can only be read once, and in increasing docId order. But if a runtime field is referenced multiple times in a query, currently the same doc value iterator will be used. This causes an error, as the second field reference will attempt to read the same iterator from a lower docId than was previously used. The fix is to create a new source loader, and thus a new doc value iterator, if the requested docId is lower than the last seen docId.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants