Skip to content

Conversation

@luigidellaquila
Copy link
Contributor

Fixing queries where INLINE STATS count(*) is calculated on results with no columns, eg.

FROM employees
| KEEP emp_no
| DROP emp_no
| INLINE STATS count(*)

Fixes: #136851

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.


private int blockIndex() {
return countAll ? 0 : channels.get(0);
return countAll ? -1 : channels.get(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this was wrong: countAll does not rely on blocks, it only relies on page.positionCount

Copy link

@cimequinox cimequinox Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent here to return an illegal value when countAll is true given that addRawInput does not call blockIndex in that case? E.g. in case a future modification leads to calling page.getBlock(-1) at line 100 ?

Also curious if the intent is to trigger the assert in addIntermediateInput in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the assert should better protect us now, in case of empty pages.

page.getBlock(-1) will never happen with current implementation, and in case of modifications it will throw a clear error (IMHO better than silently returning a wrong value, because we are calculating results based on the wrong block)

if (mask.getBoolean(0) == false) {
return;
}
count = countAll ? block.getPositionCount() : block.getTotalValueCount();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't countAll always be false in this path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it can be simplified, thanks @cimequinox!

Copy link

@cimequinox cimequinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change relies on properties of the mask when encountering blocks with no values. The calculations appear correct.

Two suggestions:

  • It would be good to add a comment describing the intent of the -1 constant returned by blockIndex.
  • The count calculation on line 107 could be simplified to count = block.getTotalValueCount();

@luigidellaquila
Copy link
Contributor Author

Thanks @cimequinox, I pushed a fix for the suggested changes

@luigidellaquila luigidellaquila enabled auto-merge (squash) October 23, 2025 13:51
from employees
| keep emp_no
| drop emp_no
| inline stats c = count(*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix doesn't seem to be inline stats specific. If so (please, correct me if I'm wrong), there should be use cases where a regular stats should reveal this error, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original query had a STATS as last command.
Unfortunately it doesn't run anymore, because of the limitations introduced by #136981, and without the LIMIT the query works just fine.
(Disabling that limitation, and with this fix, the original query doesn't fail anymore btw)

I suspect that the problem could still happen with STATS, but I couldn't find a way to reproduce it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the problem could still happen with STATS

I mean, it could have happened, but this fix should also cover normal STATS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to debug the two queries

ROW a = 12 | DROP a | STATS count(*)

ROW a = 12 | DROP a | INLINE STATS b = count(*)

The interesting bit is that with INLINE STATS the page in the aggregator has no blocks, while with STATS it still has the original block for a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking at this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've looked at this and your point @luigidellaquila about aggregator having a different behavior for stats vs. inline stats showed that what we do with inline stats has a "manual" step that I did question in code several times so far: marking a plan as optimized without pushing it through another round of optimizations, only because the parts it's made of were already pushed through the logical optimizer.

What happens is that with inline stats we get to execute a plan like this one below where the EsqlProject with empty projection list as a ProjectOperator on the execution layer gets to create a Page with no blocks (because there is nothing to project)

Aggregate[[],[COUNT(*[KEYWORD],true[BOOLEAN]) AS count(*)#11]]
\_EsqlProject[[]]
  \_LocalRelation[[a{r}#9],org.elasticsearch.xpack.esql.plan.logical.local.CopyingLocalSupplier@7cc]

While with regular stats (the same aggregation as above), the optimizer "merges" the two plans:

Rule logical.CombineProjections applied with change
Limit[1000[INTEGER],false,false]                                  = Limit[1000[INTEGER],false,false]
\_Aggregate[[],[COUNT(*[KEYWORD],true[BOOLEAN]) AS count(*)#117]] = \_Aggregate[[],[COUNT(*[KEYWORD],true[BOOLEAN]) AS count(*)#117]]
  \_EsqlProject[[]]                                               !   \_Row[[12[INTEGER] AS a#115]]
    \_Row[[12[INTEGER] AS a#115]]                                 !

And there is no ProjectOperator creating an empty Page for which AggregateOperator not to know what to do with it.

I am not saying the fix in this PR is not necessary @luigidellaquila (you know better than me the Page with no rows story), but the fact that the code reached a point with inline stats that it wouldn't reach with regular stats made me wonder. I have opened a Draft PR to explore an idea that I didn't explore so far because there were no concrete reasons to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details and for the in-depth analysis @astefan!
As long as we have zero-blocks pages, I think it makes sense to have this PR merged, but I agree with you that it would be better to be consistent between STATS and INLINE STATS.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luigidellaquila luigidellaquila merged commit b05bf64 into elastic:main Oct 24, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL: ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0

4 participants