Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some aggregate functions return 0.0 instead of NaN in some cases #1038

Open
andygrove opened this issue Oct 25, 2024 · 0 comments
Open

Some aggregate functions return 0.0 instead of NaN in some cases #1038

andygrove opened this issue Oct 25, 2024 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@andygrove
Copy link
Member

Describe the bug

SQL

SELECT c79, c54, stddev_pop(c73) FROM test1 GROUP BY c79,c54 ORDER BY c79, c54;

c79 is Byte, c54 is either Float or Double

Spark Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(3) Sort [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 1
         +- Exchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=20613]
            +- *(2) HashAggregate(keys=[c79#279, c54#254], functions=[stddev_pop(c73#273)], output=[c79#279, c54#254, stddev_pop(c73)#28050])
               +- AQEShuffleRead coalesced
                  +- ShuffleQueryStage 0
                     +- Exchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, [plan_id=20585]
                        +- *(1) HashAggregate(keys=[c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], functions=[partial_stddev_pop(c73#273)], output=[c79#279, c54#254, n#28038, avg#28039, m2#28040])
                           +- *(1) ColumnarToRow
                              +- FileScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>
+- == Initial Plan ==
   Sort [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=20567]
      +- HashAggregate(keys=[c79#279, c54#254], functions=[stddev_pop(c73#273)], output=[c79#279, c54#254, stddev_pop(c73)#28050])
         +- Exchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, [plan_id=20564]
            +- HashAggregate(keys=[c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], functions=[partial_stddev_pop(c73#273)], output=[c79#279, c54#254, n#28038, avg#28039, m2#28040])
               +- FileScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>

Comet Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(1) ColumnarToRow
   +- CometSort [c79#279, c54#254, stddev_pop(c73)#28131], [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST]
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 1
            +- CometColumnarExchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=20746]
               +- !CometHashAggregate [c79#279, c54#254, n#28119, avg#28120, m2#28121], Final, [c79#279, c54#254], [stddev_pop(c73#273)]
                  +- AQEShuffleRead coalesced
                     +- ShuffleQueryStage 0
                        +- CometExchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=20701]
                           +- !CometHashAggregate [c54#254, c73#273, c79#279], Partial, [c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], [partial_stddev_pop(c73#273)]
                              +- CometScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>
+- == Initial Plan ==
   CometSort [c79#279, c54#254, stddev_pop(c73)#28131], [c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST]
   +- CometColumnarExchange rangepartitioning(c79#279 ASC NULLS FIRST, c54#254 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=20682]
      +- !CometHashAggregate [c79#279, c54#254, n#28119, avg#28120, m2#28121], Final, [c79#279, c54#254], [stddev_pop(c73#273)]
         +- CometExchange hashpartitioning(c79#279, c54#254, 200), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=20680]
            +- !CometHashAggregate [c54#254, c73#273, c79#279], Partial, [c79#279, knownfloatingpointnormalized(normalizenanandzero(c54#254)) AS c54#254], [partial_stddev_pop(c73#273)]
               +- CometScan parquet [c54#254,c73#273,c79#279] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c54:float,c73:double,c79:tinyint>

First difference at row 4:
Spark: -127,0.31308997,NaN
Comet: -127,0.31308997,0.0

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@andygrove andygrove added the bug Something isn't working label Oct 25, 2024
@andygrove andygrove added this to the 0.4.0 milestone Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant