Coalesce batches inside hash-repartition #18572

Dandandan · 2025-11-09T20:21:47Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Copilot

Pull Request Overview

This PR adds batch coalescing to hash repartitioning to improve performance by reducing the number of small batches sent across partitions. The implementation uses BatchCoalescer from Arrow to buffer and combine small batches into larger ones (target size of 4096 rows) before sending them to output channels.

Key Changes

Adds BatchCoalescer usage specifically for hash partitioning operations
Implements buffering logic that accumulates batches until reaching target size
Includes flush logic at the end of stream processing to send remaining buffered data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-09T20:51:01Z

datafusion/physical-plan/src/repartition/mod.rs

+
+        if is_hash_partitioning {
+            for _ in 0..partitioner.num_partitions() {
+                coalesce_batches.push(BatchCoalescer::new(stream.schema(), 4096));


The hardcoded batch size of 4096 should use the configured batch_size from the session config. Other uses of BatchCoalescer::new in the codebase use context.session_config().batch_size() or config.execution.batch_size. Consider passing the batch_size as a parameter to pull_from_input from the caller (consume_input_streams) which has access to the context: Arc<TaskContext>.

datafusion/physical-plan/src/repartition/mod.rs

Copilot · 2025-11-09T20:51:01Z

datafusion/physical-plan/src/repartition/mod.rs

+                                }
+                            };
+
+                        if channel.sender.send(Some(Ok(batch_to_send))).await.is_err() {


The send timer metric (metrics.send_time[partition]) is not being tracked for these final flush batches, unlike the main sending logic at line 1245. This will result in inaccurate metrics for hash partitioning operations as the time spent sending flushed batches won't be recorded.

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-09T21:10:57Z

datafusion/physical-plan/src/repartition/mod.rs

+                        let (batch_to_send, is_memory_batch) =
+                            match channel.reservation.lock().try_grow(size) {
+                                Ok(_) => {
+                                    // Memory available - send in-memory batch
+                                    (RepartitionBatch::Memory(batch), true)
+                                }
+                                Err(_) => {
+                                    // We're memory limited - spill to SpillPool
+                                    // SpillPool handles file handle reuse and rotation
+                                    channel.spill_writer.push_batch(&batch)?;
+                                    // Send marker indicating batch was spilled
+                                    (RepartitionBatch::Spilled, false)
+                                }
+                            };
+
+                        if channel.sender.send(Some(Ok(batch_to_send))).await.is_err() {
+                            // If the other end has hung up, it was an early shutdown (e.g. LIMIT)
+                            // Only shrink memory if it was a memory batch
+                            if is_memory_batch {
+                                channel.reservation.lock().shrink(size);
+                            }
+                            output_channels.remove(&partition);
+                        }
+                    }


The batch sending logic in this flush section duplicates the logic from lines 1244-1272. This code duplication makes maintenance harder and increases the risk of inconsistencies. Consider extracting this logic into a helper function that both the main loop and flush section can use. The function could be named something like send_batch_to_channel and accept parameters for the batch, partition, channel, and metrics.

Copilot · 2025-11-09T21:10:58Z

datafusion/physical-plan/src/repartition/mod.rs

+                        let (batch_to_send, is_memory_batch) =
+                            match channel.reservation.lock().try_grow(size) {
+                                Ok(_) => {
+                                    // Memory available - send in-memory batch
+                                    (RepartitionBatch::Memory(batch), true)
+                                }
+                                Err(_) => {
+                                    // We're memory limited - spill to SpillPool
+                                    // SpillPool handles file handle reuse and rotation
+                                    channel.spill_writer.push_batch(&batch)?;
+                                    // Send marker indicating batch was spilled
+                                    (RepartitionBatch::Spilled, false)
+                                }
+                            };
+
+                        if channel.sender.send(Some(Ok(batch_to_send))).await.is_err() {
+                            // If the other end has hung up, it was an early shutdown (e.g. LIMIT)
+                            // Only shrink memory if it was a memory batch
+                            if is_memory_batch {
+                                channel.reservation.lock().shrink(size);
+                            }
+                            output_channels.remove(&partition);
+                        }
+                    }


The flush section is missing the send_time timing metrics that are recorded in the main loop (line 1246). For consistency and proper performance monitoring, consider wrapping the sending logic with let timer = metrics.send_time[partition].timer(); ... timer.done(); similar to the main loop.

Dandandan · 2025-11-09T22:55:49Z

Hmm looks like this is not always faster. Perhaps for skewed data it can maks up for it by pushing the coalesce into another thread 🤔

github-actions bot added the physical-plan Changes to the physical-plan crate label Nov 9, 2025

Coalesce batches inside hash-repartition

de2def6

Dandandan force-pushed the internalize_batch_coalesce branch from 872b975 to de2def6 Compare November 9, 2025 20:24

Coalesce batches inside hash-repartition

59741aa

Dandandan requested a review from Copilot November 9, 2025 20:48

Copilot AI reviewed Nov 9, 2025

View reviewed changes

Dandandan added 2 commits November 9, 2025 21:53

fix

46b0c5a

fix

09d8382

Dandandan requested a review from Copilot November 9, 2025 21:07

Copilot AI reviewed Nov 9, 2025

View reviewed changes

Dandandan added 3 commits November 9, 2025 22:20

hardcode to current default

63ceda4

fix

747428a

fix

ca28944

Dandandan added 2 commits November 10, 2025 09:59

Only change heuristic

eeafd74

Only change heuristic

fc90f79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coalesce batches inside hash-repartition #18572

Coalesce batches inside hash-repartition #18572

Dandandan commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Dandandan commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Coalesce batches inside hash-repartition #18572

Are you sure you want to change the base?

Coalesce batches inside hash-repartition #18572

Conversation

Dandandan commented Nov 9, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant