Skip to content

Conversation

ericniebler
Copy link
Contributor

@ericniebler ericniebler commented Oct 10, 2025

this pr ports cudax's stream scheduler to the new sender algorithm customization framework.

fixes #5564

Copy link
Contributor

copy-pr-bot bot commented Oct 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Oct 10, 2025
@ericniebler
Copy link
Contributor Author

/ok to test d4ca1d7

This comment has been minimized.

@ericniebler ericniebler force-pushed the fixing-the-stream-scheduler branch 6 times, most recently from 4393729 to 04afd01 Compare October 14, 2025 00:01
@ericniebler
Copy link
Contributor Author

/ok to test 4393729

This comment has been minimized.

@ericniebler ericniebler force-pushed the fixing-the-stream-scheduler branch from 04afd01 to 0b25974 Compare October 14, 2025 04:19
@ericniebler
Copy link
Contributor Author

/ok to test 0b25974

This comment has been minimized.

@ericniebler
Copy link
Contributor Author

/ok to test 0b25974

@ericniebler
Copy link
Contributor Author

/ok to test 4ed1940

This comment has been minimized.

@ericniebler ericniebler force-pushed the fixing-the-stream-scheduler branch from b21f145 to 95bf8fb Compare October 14, 2025 18:38
@ericniebler
Copy link
Contributor Author

/ok to test 95bf8fb

This comment has been minimized.

@ericniebler
Copy link
Contributor Author

/ok to test 0879990

This comment has been minimized.

@ericniebler ericniebler marked this pull request as ready for review October 15, 2025 01:50
@ericniebler ericniebler requested review from a team as code owners October 15, 2025 01:50
@ericniebler ericniebler requested a review from pciolkosz October 15, 2025 01:50
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Oct 15, 2025
@ericniebler ericniebler linked an issue Oct 15, 2025 that may be closed by this pull request
1 task

This comment has been minimized.

This comment has been minimized.

@ericniebler ericniebler changed the title [WIP] Fixing cudax::execution CUDA stream scheduler Fixing cudax::execution CUDA stream scheduler Oct 15, 2025
@ericniebler ericniebler requested a review from a team as a code owner October 15, 2025 23:24
@ericniebler ericniebler force-pushed the fixing-the-stream-scheduler branch from 570ab90 to a6c97a8 Compare October 15, 2025 23:30

This comment has been minimized.

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed this and made comments about changes which I don't quite understand.

I am approving this per @ericniebler 's request so that it can run through the nightlies.

//
//===----------------------------------------------------------------------===//

// BUGBUG BUGBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bug is that i forgot to remove this comment. :-P

execution::set_stopped(static_cast<_Rcvr&&>(__state_->__rcvr_));
}

[[nodiscard]] _CCCL_NODEBUG_API constexpr auto get_env() const noexcept -> __fwd_env_t<env_of_t<_Rcvr>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the consequences of this macro change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It changes the functions' attributes so that debuggers won't step over them. Otherwise it has no effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericniebler You mean, it stops inlining. We do need to restore this at some point : - )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to keep these changes to the function attributes. i learned many things while stepping through the experimental execution code in cuda-gdb. it turns out i rarely want "nodebug" apis. i very occasionally want "trivial" (force-inline nodebug) apis, but most apis should just be _CCCL_API.

}

_CCCL_NO_UNIQUE_ADDRESS continues_on_t __tag_;
/*_CCCL_NO_UNIQUE_ADDRESS*/ continues_on_t __tag_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric explained that tests were failing without removing this macro. I'm treating this removal as a temporary phenomenon. There are other ways to get this effect, e.g., the "compressed pair" pattern that the reference implementation of mdspan classically used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need to track down the compiler bug, file it, and then selectively disable _CCCL_NO_UNIQUE_ADDRESS acrosss all of CCCL wherever the bug can potentially manifest. then i can start using _CCCL_NO_UNIQUE_ADDRESS again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericniebler The other option is to use a different programming technique that guarantees that empty members occupy zero bytes. A classic one is a "compressed pair" or "compressed tuple" for storing the members, that does not store empty members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true. it will be important when this code is no longer experimental to have all the space optimizations before we lock in the ABI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to note that I am on the verge of completely dropping any use of _CCCL_NO_UNIQUE_ADDRESS

Its just a complete trainwreck in the waiting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco wrote:

Its just a complete trainwreck in the waiting

100% agree; as a language feature it means well, but in practice for us it's nothing but trouble.

{
template <__disposition _OtherDisposition>
_CCCL_NODEBUG_API constexpr auto operator==(__completion_tag<_OtherDisposition>) const noexcept -> bool
_CCCL_TRIVIAL_API constexpr auto operator==(__completion_tag<_OtherDisposition>) const noexcept -> bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the consequences of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a "trivial" api is one that is force-inline-ed and annotated so that debuggers step over them. execution::set_stopped(move(rcvr)) just calls move(rcvr).set_stopped() for example. it is never interesting to step into execution::set_stopped; you'd rather step directly into move(rcvr).set_stopped().

these annotations make debugging easier and shorten stack traces to just the good parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericniebler wrote:

these annotations make debugging easier and shorten stack traces to just the good parts.

We're gonna put these macros back like they were after the nightlies run, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that isn't my intention. these changes are intentional. they improve the debugability of the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Thanks for explaining!

Comment on lines 11 to 12
// BUGBUG BUGBUG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// BUGBUG BUGBUG

This comment has been minimized.

Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 20m 20s: Pass: 100%/42 | Total: 2h 47m | Max: 10m 32s | Hits: 99%/21364

See results here.

@ericniebler ericniebler enabled auto-merge (squash) October 16, 2025 05:06
@ericniebler ericniebler disabled auto-merge October 16, 2025 05:06
@ericniebler ericniebler enabled auto-merge (squash) October 16, 2025 05:07
@ericniebler ericniebler merged commit 907a153 into NVIDIA:main Oct 16, 2025
53 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Oct 16, 2025
@ericniebler ericniebler deleted the fixing-the-stream-scheduler branch October 16, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG]: CUDA stream scheduler in cudax execution library is broken

4 participants