-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Updated the Composite and Cardinality aggregators #20173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/datafusion
Are you sure you want to change the base?
Updated the Composite and Cardinality aggregators #20173
Conversation
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
❌ Gradle check result for e2e7b4e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
| docCount = ((InternalValueCount) subAgg).getValue(); | ||
| } | ||
| if (aggregator instanceof CardinalityAggregator) { | ||
| docCount = ((InternalCardinality) subAgg).getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect. Cardinality is an approximation and can't replace doc count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I havent dived deep on this much, just did whats needed to fix the error for the query.
this was added for queries like this were failing ie where we group by some string
query="source=hits | where MobilePhoneModel != '' | stats dc(UserID) as u by MobilePhoneModel | sort - u | head 10"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it we should figure out why agg_for_doc_count column is not coming in the response from datafusion instead of replacing with distinct count of a different column.
This can also cause issues with merging results at Coordinator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is group by for distinct count, in this case we are not adding agg_for_doc_count column in the calcite plan
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
|
❌ Gradle check result for a51bd10: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| @Override | ||
| public InternalAggregation convertRow(Map<String, Object[]> shardResult, int row, SearchContext searchContext) { | ||
| Object[] hlls = shardResult.get(name); | ||
| Object[] hlls = shardResult.get(name + "[hll_registers]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try queries with Alias on distinct cardinality and filtering on the alias ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also required without your SQL changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was required after my SQL changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dint we already cover this type of queries in Integ test?
partial_agg_optimizer.rs, to take the function name instead of just name, bcz it can be alias depending on the Physical plan. Soe.fun().name()will get the correct nameCompositeAggregator.javato handle when we havesourceConfig.fieldTypeis script filedTypeCardinalityAggregator.javato recognize names likedc(UserID)[hll_registers]SearchEngineResultConversionUtils.javato handle aggregator of typeCardinalityAggregatorDescription
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.