-
Notifications
You must be signed in to change notification settings - Fork 282
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
CUDA Experimental (cudax)
Describe the bug
due to design flaws in the sender algorithm customization scheme, the transitions between the CPU and GPU are not always orchestrated correctly by the CUDA stream scheduler in cudax, leading to hangs or crashes; hence, its tests have been disabled for some time while i come to grips with the issue. a fix must be made quickly to std::execution
for C++26.
here are my current design thoughts/directions:
- now that
ensure_started
andsplit
have been removed, there isn't much argument anymore for early customization. - we have
get_scheduler
andget_completion_scheduler<SetTag>
, and we haveget_domain
but noget_completion_domain<SetTag>
. i think this is an oversight. get_completion_[scheduler|domain]<SetTag>
needs the receiver's environment in order to properly answer the query.just()
can only know where it will complete when it knows where it is started.- although not strictly necessary, it would be helpful to adopt P3206, "A sender query for completion behaviour". if a sender is known to complete inline, then its completion scheduler/domain is the scheduler/domain on which it is started.
- every sender has two domains: the one it is started on, and the one it completes on, and they could be different. in
connect
andget_completion_signatures
we can know both. the question is how to use them.
the last bullet is the most interesting. i can imagine a scheduler wanting to do something special, say, whenever a foo
sender is started, and i can imagine another scheduler wanting to do something special when a foo
sender completes. this suggest to me that transform_sender
might need to apply two transforms to a sender in connect
, one for each domain (if they are different). Q: does it matter which order they are applied?
so a given domain might want to provide two different transforms for each sender: the "start" transform and the "complete" transform. if we had such a thing, we no longer need schedule_from
to be a different algorithm from continues_on
. domain A
can provide a "start" continues_on
transform for transfers off of context A
, and domain B
can provide a "complete" continues_on
transform for transfers onto context B
.
my intention is to implement this design in cudax and then update P3718 for (fingers crossed) inclusion in C++26.
How to Reproduce
n/a
Expected behavior
n/a
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status