About Velox's Spiller #5021
waitinfuture
started this conversation in
Ideas
Replies: 1 comment
-
@xiaoxmeng Meng, would you take a look? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Seems some behavior of Velox's Spiller is not very efficient:
Spiller::spill
is called,fillSpillRuns
will be called to calculate the hash value for every row in RowContainer, but spill runs(along with the partitionId for each row) which are not selected will be cleared. This will possibly cause re-calculation of the hash values in later calls ofSpiller::spill
.I'm thinking whether it's possible to encode the hash value for each row when type is
kAggregate
andkHashJoinBuild
inside the row layout, with some bookkeeping we can avoid those problems. The encoded hash value can still be calculated in batch. And since memory arbitrator plans to do compaction in the future, it's possible to use the hash value to compact rows of the same partition together.cc @xiaoxmeng @mbasmanova
Beta Was this translation helpful? Give feedback.
All reactions