Skip to content

GROUP BY ROLLUP returns 0 rows with empty input (should return grand total row) #21570

@jverhoeks

Description

@jverhoeks

Describe the bug

When GROUP BY ROLLUP is used with a query that produces 0 input rows, DataFusion returns 0 output rows. The SQL standard requires ROLLUP to always produce the grand total row (with NULL grouping keys and aggregate results computed over the empty set).

To Reproduce

CREATE TABLE test (category VARCHAR, value INT) AS VALUES ('a', 1);

-- With data: works correctly (returns 2 rows: group + grand total)
SELECT category, count(*) FROM test GROUP BY ROLLUP(category);
-- Returns: [('a', 1), (NULL, 1)]  ✅ correct

-- With empty input: returns 0 rows instead of 1
SELECT category, count(*) FROM test WHERE value < 0 GROUP BY ROLLUP(category);
-- Returns: []  ❌ should return [(NULL, 0)]

-- Without GROUP BY, empty input correctly returns 1 row:
SELECT count(*) FROM test WHERE value < 0;
-- Returns: [(0)]  ✅ correct

Expected behavior

Per SQL standard (and PostgreSQL, Trino, Oracle behavior):

SELECT category, count(*) FROM test WHERE value < 0 GROUP BY ROLLUP(category);
-- Should return: [(NULL, 0)]

The grand total grouping set (the empty grouping set that ROLLUP always includes) should produce a row even when there are zero input rows, because aggregate functions like count(*) have well-defined behavior on empty sets (returns 0).

Additional context

  • Tested on DataFusion 52.3.0
  • Same behavior with CUBE and explicit GROUPING SETS((), (category))
  • Regular GROUP BY category correctly returns 0 rows with empty input (no grand total expected)
  • This affects 6 TPC-DS queries (q18, q27, q36, q67, q70, q86) at small scale factors where certain joins produce 0 matching rows

Root cause (likely)

DataFusion's grouped aggregation implementation skips producing output batches when the hash table is empty (no input rows). For regular GROUP BY, this is correct. For ROLLUP/CUBE/GROUPING SETS that include the empty grouping set (), a row should always be produced for that set.

The fix would need to check if any grouping set is the empty set, and if so, produce a row with NULL grouping keys and empty-set aggregate values (count=0, sum=NULL, etc.) even when the hash table has no entries.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions