-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Enforce explicit opt-in for WITHIN GROUP syntax in aggregate UDAFs
#18607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kosiew
wants to merge
15
commits into
apache:main
Choose a base branch
from
kosiew:within-group-18109
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+343
−37
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ensure SQL WITHIN GROUP(...) is only allowed for UDAFs that opt into ordered-set semantics. Cached the support check locally and updated the condition to return an error when the clause is incorrectly used. Added a unit test to verify this rejection for non-ordered-set aggregate functions.
…UDAFs and enhance upgrade documentation for stricter planner enforcement
…DAFs and remove obsolete test file
…n for non-ordered-set aggregates
Refactor within_group_rejected_for_non_ordered_set_udaf to extract the planner error using .expect_err(...).to_string() to avoid brittle Debug formatting issues. Add within_group_allowed_for_ordered_set_udaf to test percentile_cont_udaf() registration with the planner, ensuring the planning of WITHIN GROUP invocation succeeds and verifying that the resulting plan mentions percentile_cont.
… WITHIN GROUP clause
…_integration tests
Add a concise comment for explicit opt-in in the planner. Extract argument-prepending logic into a helper method to centralize the protocol. Add tests to validate rejection of multiple ORDER BY items inside WITHIN GROUP and the combination of USING WITHIN GROUP with OVER.
…rn types and improve readability
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
development-process
Related to development process of DataFusion
documentation
Improvements or additions to documentation
sql
SQL Planner
sqllogictest
SQL Logic Tests (.slt)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #18109.
Rationale for this change
Previously, the SQL planner accepted
WITHIN GROUPclauses for all aggregate UDAFs, even those that did not explicitly support ordered-set semantics. This behavior was too permissive and inconsistent with PostgreSQL. For example, queries such asSUM(x) WITHIN GROUP (ORDER BY x)were allowed, even thoughSUMis not an ordered-set aggregate.This PR enforces stricter validation so that only UDAFs that explicitly return
truefromsupports_within_group_clause()may useWITHIN GROUP. All other aggregates now produce a clear planner error when this syntax is used.What changes are included in this PR?
Added type alias
WithinGroupExtractionto simplify complex tuple return types used by helper functions.Introduced a new helper method
extract_and_prepend_within_group_argsto centralize logic for handlingWITHIN GROUPargument rewriting.Updated the planner to:
supports_within_group_clause()can acceptWITHIN GROUP.WITHIN GROUPordering expressions to function arguments only for supported ordered-set aggregates.WITHIN GROUPis used incorrectly.Added comprehensive unit tests verifying correct behavior and failure cases:
WITHIN GROUPrejected for non-ordered-set aggregates (MIN,SUM, etc.).WITHIN GROUPaccepted for ordered-set aggregates such aspercentile_cont.OVERclauses.Updated SQL logic tests (
aggregate.slt) to reflect new rejection behavior.Updated documentation:
aggregate_functions.mdand developer docs to clarify when and howWITHIN GROUPcan be used.upgrading.mdto inform users of this stricter enforcement and migration guidance.Are these changes tested?
✅ Yes.
sql_integration.rsvalidate acceptance, rejection, and argument behavior ofWITHIN GROUPfor both valid and invalid cases.aggregate.slt) include negative test cases confirming planner rejections.Are there any user-facing changes?
✅ Yes.
Users attempting to use
WITHIN GROUPwith regular aggregates (e.g.SUM,AVG,MIN,MAX) will now see a planner error:Documentation has been updated to clearly describe
WITHIN GROUPsemantics and provide examples of valid and invalid usage.No API-breaking changes were introduced; only stricter planner validation and improved error messaging.