-
Notifications
You must be signed in to change notification settings - Fork 286
refactor(gpu): match_value to backend #2989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
18d8f80 to
3581826
Compare
3581826 to
8f80d4f
Compare
8f80d4f to
f1e19de
Compare
f1e19de to
2a72a73
Compare
2a72a73 to
6609335
Compare
agnesLeroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR @enzodimaria! You'll see I have some comments: one of them is that it'll be better performance wise to operate on possible values in parallel. You could:
- take into account other review comments
- do the change for compute_equality_selectors in a separate commit and ask for my review again, so we go step by step with it.
|
|
||
| uint32_t num_chunks = (num_input_ciphertexts + chunk_size - 1) / chunk_size; | ||
|
|
||
| for (uint32_t chunk_idx = 0; chunk_idx < num_chunks - 1; chunk_idx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two loops can also be executed in parallel (see tfhe/src/integer/server_key/radix_parallel/vector_find.rs:1285). Would be good to use different streams to do this.
6609335 to
aee97f8
Compare
aee97f8 to
793e5c6
Compare
closes: please link all relevant issues
PR content/description
Check-list: