Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Nov 11, 2025

Which issue does this PR close?

Closes #18109.

Rationale for this change

Previously, the SQL planner accepted WITHIN GROUP clauses for all aggregate UDAFs, even those that did not explicitly support ordered-set semantics. This behavior was too permissive and inconsistent with PostgreSQL. For example, queries such as SUM(x) WITHIN GROUP (ORDER BY x) were allowed, even though SUM is not an ordered-set aggregate.

This PR enforces stricter validation so that only UDAFs that explicitly return true from supports_within_group_clause() may use WITHIN GROUP. All other aggregates now produce a clear planner error when this syntax is used.

What changes are included in this PR?

  • Added type alias WithinGroupExtraction to simplify complex tuple return types used by helper functions.

  • Introduced a new helper method extract_and_prepend_within_group_args to centralize logic for handling WITHIN GROUP argument rewriting.

  • Updated the planner to:

    • Validate that only UDAFs with supports_within_group_clause() can accept WITHIN GROUP.
    • Prepend WITHIN GROUP ordering expressions to function arguments only for supported ordered-set aggregates.
    • Produce clear error messages when WITHIN GROUP is used incorrectly.
  • Added comprehensive unit tests verifying correct behavior and failure cases:

    • WITHIN GROUP rejected for non-ordered-set aggregates (MIN, SUM, etc.).
    • WITHIN GROUP accepted for ordered-set aggregates such as percentile_cont.
    • Validation for named arguments, multiple ordering expressions, and semantic conflicts with OVER clauses.
  • Updated SQL logic tests (aggregate.slt) to reflect new rejection behavior.

  • Updated documentation:

    • aggregate_functions.md and developer docs to clarify when and how WITHIN GROUP can be used.
    • upgrading.md to inform users of this stricter enforcement and migration guidance.

Are these changes tested?

✅ Yes.

  • New tests in sql_integration.rs validate acceptance, rejection, and argument behavior of WITHIN GROUP for both valid and invalid cases.
  • SQL logic tests (aggregate.slt) include negative test cases confirming planner rejections.

Are there any user-facing changes?

✅ Yes.

  • Users attempting to use WITHIN GROUP with regular aggregates (e.g. SUM, AVG, MIN, MAX) will now see a planner error:

    WITHIN GROUP is only supported for ordered-set aggregate functions

  • Documentation has been updated to clearly describe WITHIN GROUP semantics and provide examples of valid and invalid usage.

No API-breaking changes were introduced; only stricter planner validation and improved error messaging.

Ensure SQL WITHIN GROUP(...) is only allowed for UDAFs
that opt into ordered-set semantics. Cached the support
check locally and updated the condition to return an error
when the clause is incorrectly used. Added a unit test to
verify this rejection for non-ordered-set aggregate functions.
…UDAFs and enhance upgrade documentation for stricter planner enforcement
Refactor within_group_rejected_for_non_ordered_set_udaf to
extract the planner error using .expect_err(...).to_string()
to avoid brittle Debug formatting issues.

Add within_group_allowed_for_ordered_set_udaf to test
percentile_cont_udaf() registration with the planner, ensuring
the planning of WITHIN GROUP invocation succeeds and
verifying that the resulting plan mentions percentile_cont.
Add a concise comment for explicit opt-in in the planner.
Extract argument-prepending logic into a helper method
to centralize the protocol.

Add tests to validate rejection of multiple ORDER BY items
inside WITHIN GROUP and the combination of USING WITHIN
GROUP with OVER.
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt) labels Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WITHIN GROUP needs to be more strict

1 participant