fix: AQE creating a non-supported Final HashAggregate post-shuffle #1390

EmilyMatt · 2025-02-12T13:53:25Z

What issue does this close?

Closes #1389 .

Rationale for this change

As described in the issue, we'd like to prevent situations where despite the Partial aggregate being supported and converted, and the shuffle being supported and converted, the Final would not be converted, because the result expressions were not supported.
This leads to an unrecoverable state, where Spark expects an aggregate buffer to be created by the Partial HA and it doesn't exist.

What changes are included in this PR?

I've separated the conversion of the hash aggregate into a separate function(I believe everything should be separated tbh, its very hard to manage rn), which also returns information about whether the result expressions were converted, when they are not, we create a new ProjectExec with those result expressions, convert the HA without them, and place a conversion between the two, that way we can ensure a valid state at all times.
This feature can be ignored by enforcing result conversion, using "spark.comet.exec.aggregate.enforceResults=true",
result enforcing is disabled by default.

How are these changes tested?

Essentially a lot of the stability tests, will have a new plan where the aggregate is completed natively, and the ProjectExec runs in Spark, instead of the current situation, where the final stage of the HashAggregate is done in Spark completely.
Those tests currently fail because I am unable to run them with SPARK_GENERATE_GOLDEN_FILES, might be a skill issue

andygrove · 2025-02-12T16:52:13Z

Thanks @EmilyMatt. Would it be possible to add a test to reproduce the issue?

codecov-commenter · 2025-02-12T17:50:49Z

Codecov Report

Attention: Patch coverage is 70.29703% with 30 lines in your changes missing coverage. Please review.

Project coverage is 38.82%. Comparing base (f09f8af) to head (f9c133c).
Report is 25 commits behind head on main.

Files with missing lines	Patch %	Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala	61.19%	18 Missing and 8 partials ⚠️
...org/apache/comet/CometSparkSessionExtensions.scala	86.66%	0 Missing and 4 partials ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #1390       +/-   ##
=============================================
- Coverage     56.12%   38.82%   -17.31%     
- Complexity      976     2005     +1029     
=============================================
  Files           119      262      +143     
  Lines         11743    60643    +48900     
  Branches       2251    12897    +10646     
=============================================
+ Hits           6591    23544    +16953     
- Misses         4012    32598    +28586     
- Partials       1140     4501     +3361

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

EmilyMatt · 2025-02-13T12:29:32Z

@andygrove I've updated "test final min/max/count with result expressions" in the Aggregate suite to verify this more deeply, and I think many of the queries in the stability tests also have this issue.
The current implementation does not crash or anything but just converts the intermediate results back to Spark so they can be used in the Spark HashAggregate.
With this PR the aggregate will be fully native so hopefully this will be a performance boost.
I think a crash will be encountered in any Aggregate that uses an aggregate buffer, as the Partial CometHashAggregate will not generate the expected IntermediateAggBuffer representation Spark expects, and possibly not even the same data type

kazuyukitanimura

Still reviewing

kazuyukitanimura · 2025-02-14T22:48:04Z

common/src/main/scala/org/apache/comet/CometConf.scala

+      .doc("Whether to enforce converting results in the Final stage of a HashAggregate, " +
+        "When enabled, Final-mode hashAggregates will not be converted to Comet, this can cause " +
+        "issues when native shuffle is enabled. " +
+        "If this is disabled, unsupported result expressions will be " +
+        "separated into a ProjectExec to allow HashAggregate to complete natively. " +
+        "This is disabled by default.")


I would like to understand this more
Final-mode hashAggregates will not be converted to Comet so final aggregation falls back to Spark?
this can cause isues when native shuffle is enabled. why it is the case?
And when should we enable this option?

I guess When the result expression of a Final-mode HashAggregate is unsupported, the entire HashAggregate will fall back to Spark with this option enabled ?

@kazuyukitanimura
Apologies, I believe I intended to say when native shuffle is disabled(or more correctly - when columnar shuffle is used)^
Essentially whenever the Schema would come from Spark and not from the data/batch itself, we will start having problems.

Looking at the tpcds tests, we can see an example:
HashAggregate [cp_catalog_page_id,sum,sum,isEmpty,sum,isEmpty] [sum(UnscaledValue(cs_ext_sales_price)),sum(coalesce(cast(cr_return_amount as decimal(12,2)), 0.00)),sum((cs_net_profit - coalesce(cast(cr_net_loss as decimal(12,2)), 0.00))),sales,returns,profit,channel,id,sum,sum,isEmpty,sum,isEmpty] CometColumnarToRow InputAdapter CometExchange [cp_catalog_page_id] #10 CometHashAggregate

See how the Partial HA is a Comet one, and the Final one is a HashAggregate with a conversion,
This is the current implementation, and works for both shuffles, as comet still produces the data Spark expects.
However, the moment we'll output something Spark does not expect(like having the partial results of a CollectSet, the columnar shuffle will crash, due to the mismatch in data types)

The issue with the shuffle can probably be circumvented by shuffling the aggregate buffer as a binary column regardless of its datatype, then reforming it in the Final aggregate, that way both shuffles will function.
However, as discussed in the issue, this will only delay the inevitable in cases such as an unsupported ResultExpression, as a CometColumnarToRow will not recreate the expected data type a regular HashAggregate will expect, and there is no other path forward but to let the Comet aggregate expressions run their course.

Will update the documentation for the configuration option, hopefully will be able to regenerate the tpcds plans

kazuyukitanimura · 2025-02-15T00:08:59Z

#1389 mentioned

datafusion-comet/spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

Lines 491 to 494 in f099e6e

    
           // When Comet shuffle is disabled, we don't want to transform the HashAggregate 
        
           // to CometHashAggregate. Otherwise, we probably get partial Comet aggregation 
        
           // and final Spark aggregation. 
        
           isCometShuffleEnabled(conf) =>

I believe I've seen a few tests that are ignored because of this.
I don't think this is a valid situation, We should not crash based on previously Comet-ran operators if they were successful.

Are we planning to remove this comet shuffle requirement with this PR?

EmilyMatt · 2025-02-20T09:07:45Z

#1389 mentioned

datafusion-comet/spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

Lines 491 to 494 in f099e6e

// When Comet shuffle is disabled, we don't want to transform the HashAggregate

// to CometHashAggregate. Otherwise, we probably get partial Comet aggregation

// and final Spark aggregation.

isCometShuffleEnabled(conf) =>
I believe I've seen a few tests that are ignored because of this.
I don't think this is a valid situation, We should not crash based on previously Comet-ran operators if they were successful.
Are we planning to remove this comet shuffle requirement with this PR?

I don't really know why this restriction was placed here, so I feel I can't really provide an opinion on any direction, I only saw the comment and figured it is relevant as it speaks of this specific issue.
Despite the comment and restriction, this still occurs many times in our tests, so I am unsure of its importance

EmilyFlarionIO added 2 commits February 12, 2025 15:44

Ok push for now until I understand how to regenerate golden files

777d735

Fix configuration var name

51e5365

EmilyMatt changed the title ~~Ok push for now until I understand how to regenerate golden files~~ fix: AQE creating a non-supported Final HashAggregate post-shuffle Feb 12, 2025

EmilyFlarionIO added 3 commits February 12, 2025 16:05

fix compilation issues

e1941d8

scalafix

d28f855

update readme

a390ba1

Update test to test the new changes

f9c133c

kazuyukitanimura reviewed Feb 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: AQE creating a non-supported Final HashAggregate post-shuffle #1390

fix: AQE creating a non-supported Final HashAggregate post-shuffle #1390

EmilyMatt commented Feb 12, 2025

andygrove commented Feb 12, 2025

codecov-commenter commented Feb 12, 2025 •

edited

Loading

EmilyMatt commented Feb 13, 2025

kazuyukitanimura left a comment

kazuyukitanimura Feb 14, 2025

kazuyukitanimura Feb 14, 2025

EmilyMatt Feb 20, 2025

EmilyMatt Feb 20, 2025

kazuyukitanimura commented Feb 15, 2025

EmilyMatt commented Feb 20, 2025

fix: AQE creating a non-supported Final HashAggregate post-shuffle #1390

Are you sure you want to change the base?

fix: AQE creating a non-supported Final HashAggregate post-shuffle #1390

Conversation

EmilyMatt commented Feb 12, 2025

What issue does this close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

andygrove commented Feb 12, 2025

codecov-commenter commented Feb 12, 2025 • edited Loading

Codecov Report

EmilyMatt commented Feb 13, 2025

kazuyukitanimura left a comment

Choose a reason for hiding this comment

kazuyukitanimura Feb 14, 2025

Choose a reason for hiding this comment

kazuyukitanimura Feb 14, 2025

Choose a reason for hiding this comment

EmilyMatt Feb 20, 2025

Choose a reason for hiding this comment

EmilyMatt Feb 20, 2025

Choose a reason for hiding this comment

kazuyukitanimura commented Feb 15, 2025

EmilyMatt commented Feb 20, 2025

codecov-commenter commented Feb 12, 2025 •

edited

Loading