-
Notifications
You must be signed in to change notification settings - Fork 294
[CUB] Fix mask types in block_radix_rank.cuh #6189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
miscco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, We can now drop some of the internal casts
|
/ok to test 4887eb0 |
This comment has been minimized.
This comment has been minimized.
|
We should probably review the generated SASS once this PR is ready. |
|
/ok to test fb741b1 |
🥳 CI Workflow Results🟩 Finished in 3h 24m: Pass: 100%/75 | Total: 2d 18h | Max: 3h 22m | Hits: 85%/73707See results here. |
fefe4a3 to
ec8926d
Compare
0071054 to
9c228a8
Compare
| using ::cuda::std::uint32_t; | ||
| uint32_t warp_id = linear_tid >> LOG_WARP_THREADS; | ||
| uint32_t lane_mask_lt = ::cuda::ptx::get_sreg_lanemask_lt(); | ||
|
|
||
| _CCCL_PRAGMA_UNROLL_FULL() | ||
| for (int ITEM = 0; ITEM < KEYS_PER_THREAD; ++ITEM) | ||
| { | ||
| // My digit | ||
| ::cuda::std::uint32_t digit = digit_extractor.Digit(keys[ITEM]); | ||
| uint32_t digit = digit_extractor.Digit(keys[ITEM]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one occurrence, please use the fully qualified one there
| ::cuda::std::uint32_t bin_mask = *p_match_mask; | ||
| int leader = ::cuda::std::__bit_log2(bin_mask); | ||
| int warp_offset = 0; | ||
| int popc = __popc(bin_mask & ::cuda::ptx::get_sreg_lanemask_le()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use ::cuda::std::popcount
| int lane_mask = 1 << lane; | ||
| int* warp_offsets = &s.warp_offsets[warp][0]; | ||
| int* match_masks = &s.match_masks[warp][0]; | ||
| ::cuda::std::uint32_t lane_mask = 1 << lane; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better alternative is
| ::cuda::std::uint32_t lane_mask = 1 << lane; | |
| auto lane_mask = 1u << lane; |
signed shift is UB before C++20
Use
uint32_tfor bitmasks in block_radix_rank.cuh.cuda::stdfunctionsFixes #6106