Skip to content

Conversation

@jcosborn
Copy link
Contributor

This covers all files except the dslashes. Those will be merged later to keep the complexity of this PR down.

@jcosborn jcosborn requested review from a team as code owners June 20, 2025 23:19
@jcosborn
Copy link
Contributor Author

jcosborn commented Jul 3, 2025

@maddyscientist As for simplifying the calls to dslash.template operator() in dslash_helper.cuh, actually only the dslashes that needed the allthreads handling have been updated to take the extra parameters, the others can't currently be called that way. We could simplify the calling code if we made all dslashes have the same interface, but I haven't done that mainly to reduce the number of changes. I can easily implement that if you prefer it.

@maddyscientist
Copy link
Member

@jcosborn for those QUDA developers not on the portability calls (who are reviewing this PR) can you describe why these changes are needed?

@jcosborn
Copy link
Contributor Author

jcosborn commented Jul 8, 2025

The early thread exit handling changes are to support programming models (like SYCL) which only support block collectives when all threads in a block are active (non-exited). The changes allow targets to have all threads enter a kernel functor to participate in the block collectives when block collectives are used in a kernel. For these kernels, instead of exiting when a thread is determined to be out-of-bounds for the kernel, all threads can enter the kernel, and the out-of-bounds ones will be marked inactive with an extra argument. Only kernels that need this handling need any changes. The kernel functors that need this handling have an extra template parameter allthreads which is true when it is being called with all threads entering (with some possibly out-of-bounds), and when false the functor should behave exactly as before. There is also an extra functor argument active which specifies if the thread is in-bounds (active) or not. When allthreads is true, the modified kernels then need to ensure that no out-of-bounds memory accesses occur from threads that aren't active, and also ensure that all threads (active or not) participate in the collectives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants