Describe the bug
When GROUP BY ROLLUP is used with a query that produces 0 input rows, DataFusion returns 0 output rows. The SQL standard requires ROLLUP to always produce the grand total row (with NULL grouping keys and aggregate results computed over the empty set).
To Reproduce
CREATE TABLE test (category VARCHAR, value INT) AS VALUES ('a', 1);
-- With data: works correctly (returns 2 rows: group + grand total)
SELECT category, count(*) FROM test GROUP BY ROLLUP(category);
-- Returns: [('a', 1), (NULL, 1)] ✅ correct
-- With empty input: returns 0 rows instead of 1
SELECT category, count(*) FROM test WHERE value < 0 GROUP BY ROLLUP(category);
-- Returns: [] ❌ should return [(NULL, 0)]
-- Without GROUP BY, empty input correctly returns 1 row:
SELECT count(*) FROM test WHERE value < 0;
-- Returns: [(0)] ✅ correct
Expected behavior
Per SQL standard (and PostgreSQL, Trino, Oracle behavior):
SELECT category, count(*) FROM test WHERE value < 0 GROUP BY ROLLUP(category);
-- Should return: [(NULL, 0)]
The grand total grouping set (the empty grouping set that ROLLUP always includes) should produce a row even when there are zero input rows, because aggregate functions like count(*) have well-defined behavior on empty sets (returns 0).
Additional context
- Tested on DataFusion 52.3.0
- Same behavior with
CUBE and explicit GROUPING SETS((), (category))
- Regular
GROUP BY category correctly returns 0 rows with empty input (no grand total expected)
- This affects 6 TPC-DS queries (q18, q27, q36, q67, q70, q86) at small scale factors where certain joins produce 0 matching rows
Root cause (likely)
DataFusion's grouped aggregation implementation skips producing output batches when the hash table is empty (no input rows). For regular GROUP BY, this is correct. For ROLLUP/CUBE/GROUPING SETS that include the empty grouping set (), a row should always be produced for that set.
The fix would need to check if any grouping set is the empty set, and if so, produce a row with NULL grouping keys and empty-set aggregate values (count=0, sum=NULL, etc.) even when the hash table has no entries.
Describe the bug
When
GROUP BY ROLLUPis used with a query that produces 0 input rows, DataFusion returns 0 output rows. The SQL standard requires ROLLUP to always produce the grand total row (with NULL grouping keys and aggregate results computed over the empty set).To Reproduce
Expected behavior
Per SQL standard (and PostgreSQL, Trino, Oracle behavior):
The grand total grouping set (the empty grouping set that ROLLUP always includes) should produce a row even when there are zero input rows, because aggregate functions like
count(*)have well-defined behavior on empty sets (returns 0).Additional context
CUBEand explicitGROUPING SETS((), (category))GROUP BY categorycorrectly returns 0 rows with empty input (no grand total expected)Root cause (likely)
DataFusion's grouped aggregation implementation skips producing output batches when the hash table is empty (no input rows). For regular
GROUP BY, this is correct. ForROLLUP/CUBE/GROUPING SETSthat include the empty grouping set(), a row should always be produced for that set.The fix would need to check if any grouping set is the empty set, and if so, produce a row with NULL grouping keys and empty-set aggregate values (count=0, sum=NULL, etc.) even when the hash table has no entries.