Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support string type key in prefix sort #11527

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zhli1142015
Copy link
Contributor

@zhli1142015 zhli1142015 commented Nov 13, 2024

Support string type key in PrefixSort.
Use the value of config prefixsort_string_prefix_length + 1 as the encoding
size for string keys.
Default value of prefixsort_string_prefix_length is 12.

Perf result:

StdSort_no-payloads_1_varchar_1k                          147.81ns     6.77M
PrefixSort                                      296.39%    49.87ns    20.05M
StdSort_no-payloads_2_varchar_1k                          157.00ns     6.37M
PrefixSort                                      217.29%    72.25ns    13.84M
StdSort_no-payloads_3_varchar_1k                          156.32ns     6.40M
PrefixSort                                      198.98%    78.56ns    12.73M
StdSort_no-payloads_4_varchar_1k                          159.35ns     6.28M
PrefixSort                                      198.43%    80.31ns    12.45M
StdSort_no-payloads_1_varchar_10k                         246.72ns     4.05M
PrefixSort                                      252.32%    97.78ns    10.23M
StdSort_no-payloads_2_varchar_10k                         271.68ns     3.68M
PrefixSort                                      232.28%   116.96ns     8.55M
StdSort_no-payloads_3_varchar_10k                         281.83ns     3.55M
PrefixSort                                      222.81%   126.49ns     7.91M
StdSort_no-payloads_4_varchar_10k                         281.21ns     3.56M
PrefixSort                                      214.54%   131.07ns     7.63M
StdSort_no-payloads_1_varchar_100k                        323.74ns     3.09M
PrefixSort                                      305.59%   105.94ns     9.44M
StdSort_no-payloads_2_varchar_100k                        370.44ns     2.70M
PrefixSort                                      227.32%   162.96ns     6.14M
StdSort_no-payloads_3_varchar_100k                        443.54ns     2.25M
PrefixSort                                      202.02%   219.56ns     4.55M
StdSort_no-payloads_4_varchar_100k                        476.67ns     2.10M
PrefixSort                                      189.52%   251.52ns     3.98M
StdSort_no-payloads_1_varchar_1000k                       798.69ns     1.25M
PrefixSort                                      636.51%   125.48ns     7.97M
StdSort_no-payloads_2_varchar_1000k                       858.31ns     1.17M
PrefixSort                                      191.15%   449.01ns     2.23M
StdSort_no-payloads_3_varchar_1000k                       992.40ns     1.01M
PrefixSort                                      131.49%   754.71ns     1.33M
StdSort_no-payloads_4_varchar_1000k                       985.11ns     1.02M
PrefixSort                                      158.50%   621.53ns     1.61M

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2024
Copy link

netlify bot commented Nov 13, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 8bd3b02
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6735b963615c8b00082ac2cc

@zhli1142015
Copy link
Contributor Author

cc @jinchengchenghh , @skadilover , @xiaoxmeng , could you help to review this PR? Thanks.

@Yuhta Yuhta requested a review from xiaoxmeng November 13, 2024 16:29
velox/exec/PrefixSort.cpp Outdated Show resolved Hide resolved
velox/exec/PrefixSort.cpp Outdated Show resolved Hide resolved
velox/exec/benchmarks/PrefixSortBenchmark.cpp Outdated Show resolved Hide resolved
velox/exec/prefixsort/PrefixSortEncoder.h Show resolved Hide resolved
velox/exec/prefixsort/PrefixSortEncoder.h Outdated Show resolved Hide resolved
velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/exec/tests/PrefixSortTest.cpp Show resolved Hide resolved
@jinchengchenghh
Copy link
Contributor

And we can do the further optimization after we add the RowContainer stats, which may record the maxSize of each row, if encodeSize is more than maxSize, we can say it can be fully encoded.

@zhli1142015
Copy link
Contributor Author

And we can do the further optimization after we add the RowContainer stats, which may record the maxSize of each row, if encodeSize is more than maxSize, we can say it can be fully encoded.

If maxSize < prefixsort_string_prefix_length, the encoded size should be maxSize + 1, indicating the column is fully encoded. This allows inclusion of following keys in the prefix and less memory allocation for prefix.
Planning to implement this in the next PR.

Copy link
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minors.

velox/exec/prefixsort/PrefixSortEncoder.h Outdated Show resolved Hide resolved
velox/exec/prefixsort/PrefixSortEncoder.h Outdated Show resolved Hide resolved
velox/exec/prefixsort/PrefixSortEncoder.h Outdated Show resolved Hide resolved
velox/exec/tests/PrefixSortTest.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much appreciate for your contribution.

@jinchengchenghh
Copy link
Contributor

Can you rerun the benchmark to measure the effect that avoid copy string?

@zhli1142015
Copy link
Contributor Author

Cases where strings are stored separately in different blocks should be rare, so the results won’t differ much.

StdSort_no-payloads_1_varchar_1k                          154.76ns     6.46M
PrefixSort                                      293.38%    52.75ns    18.96M
StdSort_no-payloads_2_varchar_1k                          161.67ns     6.19M
PrefixSort                                      209.76%    77.07ns    12.98M
StdSort_no-payloads_3_varchar_1k                          161.56ns     6.19M
PrefixSort                                      194.60%    83.02ns    12.04M
StdSort_no-payloads_4_varchar_1k                          164.47ns     6.08M
PrefixSort                                      191.84%    85.73ns    11.66M
StdSort_no-payloads_1_varchar_10k                         241.94ns     4.13M
PrefixSort                                      278.68%    86.81ns    11.52M
StdSort_no-payloads_2_varchar_10k                         239.05ns     4.18M
PrefixSort                                      234.25%   102.05ns     9.80M
StdSort_no-payloads_3_varchar_10k                         245.33ns     4.08M
PrefixSort                                      218.46%   112.30ns     8.90M
StdSort_no-payloads_4_varchar_10k                         247.89ns     4.03M
PrefixSort                                      217.00%   114.24ns     8.75M
StdSort_no-payloads_1_varchar_100k                        345.14ns     2.90M
PrefixSort                                      323.30%   106.76ns     9.37M
StdSort_no-payloads_2_varchar_100k                        340.79ns     2.93M
PrefixSort                                      219.84%   155.01ns     6.45M
StdSort_no-payloads_3_varchar_100k                        388.60ns     2.57M
PrefixSort                                      200.12%   194.18ns     5.15M
StdSort_no-payloads_4_varchar_100k                        421.94ns     2.37M
PrefixSort                                      184.94%   228.16ns     4.38M
StdSort_no-payloads_1_varchar_1000k                       709.13ns     1.41M
PrefixSort                                      571.14%   124.16ns     8.05M
StdSort_no-payloads_2_varchar_1000k                       808.04ns     1.24M
PrefixSort                                      191.03%   423.00ns     2.36M
StdSort_no-payloads_3_varchar_1000k                       937.25ns     1.07M
PrefixSort                                      174.17%   538.12ns     1.86M
StdSort_no-payloads_4_varchar_1000k                         1.07us   933.94K
PrefixSort                                      183.02%   585.04ns     1.71M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants