-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Coalesce batches inside FilterExec #18604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@alamb could you run some benchmarks? |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice 👌 I kicked off some benchmarks so let's see how that goes
| continue; | ||
| } | ||
| None => { | ||
| self.batch_coalescer.finish_buffered_batch().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably check for the error here rather than unwrapping
|
🤖: Benchmark completed Details
|
|
🤖 |
|
The lack of consistency is really bugging me -- I am going to see if I can find some way to have more consistent benchmarks |
|
🤖: Benchmark completed Details
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice 👌 I kicked off some benchmarks
I don't think they show super meaningful improvements, but it doesn't hurt, so 👍 from me
Thanks @Dandandan
|
Hm interesting. I realize now But there is some more potential, and we can remove the |
Which issue does this PR close?
Rationale for this change
Moe coalesce batches inside filter exec. We can use
BatchCoalescer ::push_batch_with_filterwhich should give a speed up compared to filtering individual batches + concatenating afterwards.What changes are included in this PR?
Changes the FilterExec to coalesce batches inside.
I did not make a change to remove CoalesceBatchesExec from all the plans, I plan to create an issue and a PR after this is merged to do so.
Now it should be mostly a no-op with limited overhead as input batches are already well-sized.
Are these changes tested?
Are there any user-facing changes?