-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support string type key in prefix sort #11527
base: main
Are you sure you want to change the base?
Support string type key in prefix sort #11527
Conversation
✅ Deploy Preview for meta-velox canceled.
|
cc @jinchengchenghh , @skadilover , @xiaoxmeng , could you help to review this PR? Thanks. |
And we can do the further optimization after we add the RowContainer stats, which may record the maxSize of each row, if encodeSize is more than maxSize, we can say it can be fully encoded. |
If maxSize < prefixsort_string_prefix_length, the encoded size should be maxSize + 1, indicating the column is fully encoded. This allows inclusion of following keys in the prefix and less memory allocation for prefix. |
4c28fa5
to
6ddfde4
Compare
6ddfde4
to
18e71c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some minors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much appreciate for your contribution.
Can you rerun the benchmark to measure the effect that avoid copy string? |
Cases where strings are stored separately in different blocks should be rare, so the results won’t differ much.
|
Support string type key in PrefixSort.
Use the value of config
prefixsort_string_prefix_length
+ 1 as the encodingsize for string keys.
Default value of
prefixsort_string_prefix_length
is 12.Perf result: