Some aggregate functions return 0.0 instead of NaN in some cases #1038

andygrove · 2024-10-25T14:45:10Z

Describe the bug

SQL

SELECT c79, c54, stddev_pop(c73) FROM test1 GROUP BY c79,c54 ORDER BY c79, c54;

c79 is Byte, c54 is either Float or Double

Spark Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(3) Sort [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 1
         +- Exchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=20613]
            +- *(2) HashAggregate(keys=[c79#279, c54#254], functions=[stddev_pop(c73#273)], output=[c79#279, c54#254, stddev_pop(c73)#28050])
               +- AQEShuffleRead coalesced
                  +- ShuffleQueryStage 0
                     +- Exchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, [plan_id=20585]
                        +- *(1) HashAggregate(keys=[c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], functions=[partial_stddev_pop(c73#273)], output=[c79#279, c54#254, n#28038, avg#28039, m2#28040])
                           +- *(1) ColumnarToRow
                              +- FileScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>
+- == Initial Plan ==
   Sort [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=20567]
      +- HashAggregate(keys=[c79#279, c54#254], functions=[stddev_pop(c73#273)], output=[c79#279, c54#254, stddev_pop(c73)#28050])
         +- Exchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, [plan_id=20564]
            +- HashAggregate(keys=[c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], functions=[partial_stddev_pop(c73#273)], output=[c79#279, c54#254, n#28038, avg#28039, m2#28040])
               +- FileScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>

Comet Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(1) ColumnarToRow
   +- CometSort [c79#279, c54#254, stddev_pop(c73)#28131], [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST]
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 1
            +- CometColumnarExchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=20746]
               +- !CometHashAggregate [c79#279, c54#254, n#28119, avg#28120, m2#28121], Final, [c79#279, c54#254], [stddev_pop(c73#273)]
                  +- AQEShuffleRead coalesced
                     +- ShuffleQueryStage 0
                        +- CometExchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=20701]
                           +- !CometHashAggregate [c54#254, c73#273, c79#279], Partial, [c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], [partial_stddev_pop(c73#273)]
                              +- CometScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>
+- == Initial Plan ==
   CometSort [c79#279, c54#254, stddev_pop(c73)#28131], [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST]
   +- CometColumnarExchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=20682]
      +- !CometHashAggregate [c79#279, c54#254, n#28119, avg#28120, m2#28121], Final, [c79#279, c54#254], [stddev_pop(c73#273)]
         +- CometExchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=20680]
            +- !CometHashAggregate [c54#254, c73#273, c79#279], Partial, [c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], [partial_stddev_pop(c73#273)]
               +- CometScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>

First difference at row 4:
Spark: -127,0.31308997,NaN
Comet: -127,0.31308997,0.0

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

andygrove added the bug Something isn't working label Oct 25, 2024

andygrove added this to the 0.4.0 milestone Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some aggregate functions return 0.0 instead of NaN in some cases #1038

Some aggregate functions return 0.0 instead of NaN in some cases #1038

andygrove commented Oct 25, 2024

Some aggregate functions return 0.0 instead of NaN in some cases #1038

Some aggregate functions return 0.0 instead of NaN in some cases #1038

Comments

andygrove commented Oct 25, 2024

Describe the bug

SQL

Spark Plan

Comet Plan

Steps to reproduce

Expected behavior

Additional context