feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. #1021

Kontinuation · 2024-10-16T03:51:04Z

Which issue does this PR close?

This issue relates to #996 and #1004

Rationale for this change

This is for investigating various approaches to simplify memory-related configuration and reduce the memory required to run large queries. @andygrove

What changes are included in this PR?

This PR adds a spark.comet.exec.memoryPool configuration for easily running queries using various memory pool setups.

greedy: Each operator has its own GreedyMemoryPool, which is the same as the current situation.
fair_spill: Each operator has its own FairSpillPool
greedy_task_shared (default): All operators for the same task attempt share the same GreedyMemoryPool.
fair_spill_task_shared: All operators for the same task attempt share the same FairSpillPool
greedy_global: All operators in the same executor instance share the same GreedyMemoryPool
fair_spill_global: All operators in the same executor instance share the same FairSpillPool

How are these changes tested?

TODO: add tests running in native memory management mode.

…plemented skeleton in JVM to support task-level memory pool

…ult, add an environment variable for enabling offheap memory when running tests.

…ll though

…oes not matter (it is managed by Arc)

…memory pool, which does not work well.

… memory_limit applies to the entire instance.

codecov-commenter · 2024-10-16T05:35:10Z

Codecov Report

Attention: Patch coverage is 77.27273% with 5 lines in your changes missing coverage. Please review.

Project coverage is 34.27%. Comparing base (591f45a) to head (bd5a0c7).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...ain/scala/org/apache/comet/CometExecIterator.scala	73.68%	2 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1021      +/-   ##
============================================
- Coverage     34.30%   34.27%   -0.03%     
- Complexity      887      889       +2     
============================================
  Files           112      112              
  Lines         43429    43502      +73     
  Branches       9623     9615       -8     
============================================
+ Hits          14897    14912      +15     
- Misses        25473    25542      +69     
+ Partials       3059     3048      -11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… pool

andygrove · 2024-10-16T21:17:20Z

Thanks for filing this @Kontinuation. I will close my PRs.

Also, there is a suggestion in #1017 for always using unified memory.

Kontinuation · 2024-10-17T14:31:32Z

Thanks for filing this @Kontinuation. I will close my PRs.

Also, there is a suggestion in #1017 for always using unified memory.

I agree that using the unified memory manager is a better approach. Vanilla Spark operators and comet operators are governed by the same memory manager and they are all using offheap memory. Vanilla Spark operators can free some memory when comet operators are under pressure. I'll also put more work into improving the unified memory management.

I think the native memory management approach may still be relevant when users don't want vanilla Spark to use off-heap memory. We can set the default value of spark.comet.exec.memoryPool as greedy_task_shared to make comet memory overhead configuration more intuitive. If users want memory-intensive operators to spill properly, they can try out fair_spill_task_shared.

Kontinuation added 9 commits October 16, 2024 10:48

An experimental change to use FairSpillPool for each operator, and im…

903d2f3

…plemented skeleton in JVM to support task-level memory pool

Implemented task-level shared fair spill pool for comet operators

901baca

Remove println statements, do not use offheap memory in tests by defa…

dc2af74

…ult, add an environment variable for enabling offheap memory when running tests.

Implemented globally shared fair spill pool, it does not work very we…

8e34087

…ll though

Remove CometNativePlanWrapper since releasing the memory pool early d…

e3a77cf

…oes not matter (it is managed by Arc)

Add greedy_global mode for experimenting with globally shared greedy …

94e27fd

…memory pool, which does not work well.

Derive the per-task memory pool size from total memory_limit, so that…

97a0c48

… memory_limit applies to the entire instance.

Rename memory pool configuration and change default value to greedy

5dc8ce0

Revert changes for enabling native memory management

4607e8d

Remove the 2 JNI interfaces added for managing per-task shared memory…

33f0424

… pool

Kontinuation force-pushed the configurable-native-mempool branch from 2db1e2a to 33f0424 Compare October 16, 2024 12:58

This was referenced Oct 16, 2024

feat: Use fair-spill pool when spark.memory.offHeap.enabled=false #1004

Closed

feat: Implement shared memory pool for case where spark.memory.offHeap.enabled=false #1002

Closed

Support greedy_task_shared

7f68c4f

Kontinuation marked this pull request as ready for review October 18, 2024 04:30

Change default native memory management method to greedy_task_shared

bd5a0c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. #1021

feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. #1021

Kontinuation commented Oct 16, 2024 •

edited

Loading

codecov-commenter commented Oct 16, 2024 •

edited

Loading

andygrove commented Oct 16, 2024

Kontinuation commented Oct 17, 2024

feat: Add a spark.comet.exec.memoryPool configuration for experimenting with various datafusion memory pool setups. #1021

Are you sure you want to change the base?

feat: Add a spark.comet.exec.memoryPool configuration for experimenting with various datafusion memory pool setups. #1021

Conversation

Kontinuation commented Oct 16, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

codecov-commenter commented Oct 16, 2024 • edited Loading

Codecov Report

andygrove commented Oct 16, 2024

Kontinuation commented Oct 17, 2024

feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. #1021

feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. #1021

Kontinuation commented Oct 16, 2024 •

edited

Loading

codecov-commenter commented Oct 16, 2024 •

edited

Loading