-
Notifications
You must be signed in to change notification settings - Fork 283
Fixing cudax::execution CUDA stream scheduler #6175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing cudax::execution CUDA stream scheduler #6175
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test d4ca1d7 |
This comment has been minimized.
This comment has been minimized.
4393729
to
04afd01
Compare
/ok to test 4393729 |
This comment has been minimized.
This comment has been minimized.
04afd01
to
0b25974
Compare
/ok to test 0b25974 |
This comment has been minimized.
This comment has been minimized.
/ok to test 0b25974 |
/ok to test 4ed1940 |
This comment has been minimized.
This comment has been minimized.
b21f145
to
95bf8fb
Compare
/ok to test 95bf8fb |
This comment has been minimized.
This comment has been minimized.
/ok to test 0879990 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cudax/include/cuda/experimental/__execution/stream/starts_on.cuh
Outdated
Show resolved
Hide resolved
570ab90
to
a6c97a8
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed this and made comments about changes which I don't quite understand.
I am approving this per @ericniebler 's request so that it can run through the nightlies.
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
// BUGBUG BUGBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bug is that i forgot to remove this comment. :-P
execution::set_stopped(static_cast<_Rcvr&&>(__state_->__rcvr_)); | ||
} | ||
|
||
[[nodiscard]] _CCCL_NODEBUG_API constexpr auto get_env() const noexcept -> __fwd_env_t<env_of_t<_Rcvr>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the consequences of this macro change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes the functions' attributes so that debuggers won't step over them. Otherwise it has no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericniebler You mean, it stops inlining. We do need to restore this at some point : - )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would like to keep these changes to the function attributes. i learned many things while stepping through the experimental execution
code in cuda-gdb. it turns out i rarely want "nodebug" apis. i very occasionally want "trivial" (force-inline nodebug) apis, but most apis should just be _CCCL_API
.
} | ||
|
||
_CCCL_NO_UNIQUE_ADDRESS continues_on_t __tag_; | ||
/*_CCCL_NO_UNIQUE_ADDRESS*/ continues_on_t __tag_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eric explained that tests were failing without removing this macro. I'm treating this removal as a temporary phenomenon. There are other ways to get this effect, e.g., the "compressed pair" pattern that the reference implementation of mdspan classically used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i need to track down the compiler bug, file it, and then selectively disable _CCCL_NO_UNIQUE_ADDRESS
acrosss all of CCCL wherever the bug can potentially manifest. then i can start using _CCCL_NO_UNIQUE_ADDRESS
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericniebler The other option is to use a different programming technique that guarantees that empty members occupy zero bytes. A classic one is a "compressed pair" or "compressed tuple" for storing the members, that does not store empty members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true. it will be important when this code is no longer experimental to have all the space optimizations before we lock in the ABI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to note that I am on the verge of completely dropping any use of _CCCL_NO_UNIQUE_ADDRESS
Its just a complete trainwreck in the waiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco wrote:
Its just a complete trainwreck in the waiting
100% agree; as a language feature it means well, but in practice for us it's nothing but trouble.
{ | ||
template <__disposition _OtherDisposition> | ||
_CCCL_NODEBUG_API constexpr auto operator==(__completion_tag<_OtherDisposition>) const noexcept -> bool | ||
_CCCL_TRIVIAL_API constexpr auto operator==(__completion_tag<_OtherDisposition>) const noexcept -> bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the consequences of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a "trivial" api is one that is force-inline
-ed and annotated so that debuggers step over them. execution::set_stopped(move(rcvr))
just calls move(rcvr).set_stopped()
for example. it is never interesting to step into execution::set_stopped
; you'd rather step directly into move(rcvr).set_stopped()
.
these annotations make debugging easier and shorten stack traces to just the good parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericniebler wrote:
these annotations make debugging easier and shorten stack traces to just the good parts.
We're gonna put these macros back like they were after the nightlies run, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that isn't my intention. these changes are intentional. they improve the debugability of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. Thanks for explaining!
// BUGBUG BUGBUG | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// BUGBUG BUGBUG |
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 20m 20s: Pass: 100%/42 | Total: 2h 47m | Max: 10m 32s | Hits: 99%/21364See results here. |
this pr ports cudax's stream scheduler to the new sender algorithm customization framework.
fixes #5564