-
Notifications
You must be signed in to change notification settings - Fork 283
Replace inline PTX by cuda::ptx in cuda::barrier<thread_scope_block> #6250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks technically correct, but the formatting is atrocious. Could we add an else
to the conditions, as all early branches are returning?
6a63546
to
81939ba
Compare
if (!::cuda::device::is_object_from(__barrier, ::cuda::device::address_space::cluster_shared)) | ||
{ | ||
return __barrier.arrive(__update); | ||
} | ||
if (!::cuda::device::is_object_from(__barrier, ::cuda::device::address_space::shared)) | ||
{ | ||
::__trap(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strange, because the first condition takes anything but cluster_shared
, so the second one seems wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that is shared
is also cluster_shared
. Because the shared memory address space is part of the cluster shared memory space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we trap for any barrier that is in cluster shared memory, but not in the shared memory of the current CTA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we turn that into an else if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could not be a precondition instead?
😬 CI Workflow Results🟥 Finished in 4h 41m: Pass: 42%/84 | Total: 7h 57m | Max: 39m 16s | Hits: 99%/25847See results here. |
::__cvta_generic_to_shared(&__barrier))) | ||
: "memory"); | ||
})) | ||
(if (::cuda::device::is_object_from(__barrier, ::cuda::device::address_space::cluster_shared)) { ::__trap(); })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda::std::terminate()
?
} | ||
if (!::cuda::device::is_object_from(__barrier, ::cuda::device::address_space::shared)) | ||
{ | ||
::__trap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a _CCCL_ASSERT
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should probably assert here. And check the documentation whether we make it clear that barriers ought not live in cluster shared memory.
unsigned int __activeA = ::__match_any_sync(__mask, __update); | ||
unsigned int __activeB = ::__match_any_sync(__mask, reinterpret_cast<::cuda::std::uintptr_t>(&__barrier)); | ||
unsigned int __active = __activeA & __activeB; | ||
int __inc = ::__popc(__active) * __update; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not worth to move to their C++ versions
if (!::cuda::device::is_object_from(__barrier, ::cuda::device::address_space::cluster_shared)) | ||
{ | ||
return __barrier.arrive(__update); | ||
} | ||
if (!::cuda::device::is_object_from(__barrier, ::cuda::device::address_space::shared)) | ||
{ | ||
::__trap(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could not be a precondition instead?
Due to lack of a good example for a SASS test, I used this simple example:
Compiled for sm100, it does differ in SASS a little bit, but the compiler just flipped a branch:
