Drop experimental TMA exposure in cuda::barrier #6225

bernhardmgruber · 2025-10-14T13:38:38Z

This is all exposed by cuda::ptx. It was originally introduced in #379

This is all exposed by cuda::ptx

bernhardmgruber · 2025-10-14T13:39:42Z

libcudacxx/include/cuda/barrier

-// Forward-declare CUtensorMap for use in cp_async_bulk_tensor_* PTX wrapping
-// functions. These functions take a pointer to CUtensorMap, so do not need to
-// know its size. This type is defined in cuda.h (driver API) as:
-//
-//     typedef struct CUtensorMap_st {  [ .. snip .. ] } CUtensorMap;
-//
-// We need to forward-declare both CUtensorMap_st (the struct) and CUtensorMap
-// (the typedef):
-struct CUtensorMap_st;
-typedef struct CUtensorMap_st CUtensorMap;


This is a borderline breaking change. In principle, we require users to include the headers for what they should use, so I think they need to ensure they have the right header to use CUtensorMap. But I am also fine leaving the forward declaration in.

bernhardmgruber · 2025-10-14T13:41:33Z

libcudacxx/include/cuda/barrier

-inline _CCCL_DEVICE void cp_async_bulk_global_to_shared(
-  void* __dest, const void* __src, ::cuda::std::uint32_t __size, ::cuda::barrier<::cuda::thread_scope_block>& __bar)
-{
-  _CCCL_ASSERT(__size % 16 == 0, "Size must be multiple of 16.");
-  _CCCL_ASSERT(::cuda::device::is_address_from(__dest, ::cuda::device::address_space::shared),
-               "Destination must be shared memory address.");
-  _CCCL_ASSERT(::cuda::device::is_address_from(__src, ::cuda::device::address_space::global),
-               "Source must be global memory address.");
-
-  ::cuda::ptx::cp_async_bulk(
-    ::cuda::ptx::space_cluster,
-    ::cuda::ptx::space_global,
-    __dest,
-    __src,
-    __size,
-    ::cuda::device::barrier_native_handle(__bar));
-}


I find it a bit sad to leave some of the assertions behind here. But those functions were not used anyway in our barrier and memcpy_async implementation etc. I think we should consider adding some assertions to the PTX exposure in cuda::ptx. @ahendriksen do you think we can add such assertions there?

Adding assertions in the code generator could be very hard because they are specific for each instruction. For example, I wrote the code for warp_shuffle by hand for this reason

bernhardmgruber · 2025-10-14T16:23:07Z

@ahendriksen I would like your review as well, since I think you added this functionality.

github-actions · 2025-10-14T16:30:58Z

🥳 CI Workflow Results

🟩 Finished in 2h 48m: Pass: 100%/84 | Total: 2d 00h | Max: 2h 47m | Hits: 81%/208367

See results here.

copy-pr-bot · 2025-10-14T18:36:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

bernhardmgruber · 2025-10-14T18:37:13Z

Switching to draft to prevent accidential merging. Waiting for approval from @ahendriksen.

Drop experimental TMA exposure in cuda::barrier

a65a90b

This is all exposed by cuda::ptx

bernhardmgruber requested a review from a team as a code owner October 14, 2025 13:38

bernhardmgruber requested a review from wmaxey October 14, 2025 13:38

github-project-automation bot added this to CCCL Oct 14, 2025

github-project-automation bot moved this to Todo in CCCL Oct 14, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Oct 14, 2025

bernhardmgruber commented Oct 14, 2025

View reviewed changes

bernhardmgruber requested a review from ahendriksen October 14, 2025 16:22

miscco approved these changes Oct 14, 2025

View reviewed changes

fbusato approved these changes Oct 14, 2025

View reviewed changes

bernhardmgruber marked this pull request as draft October 14, 2025 18:36

cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop experimental TMA exposure in cuda::barrier #6225

Drop experimental TMA exposure in cuda::barrier #6225

Uh oh!

bernhardmgruber commented Oct 14, 2025

Uh oh!

bernhardmgruber Oct 14, 2025

Uh oh!

bernhardmgruber Oct 14, 2025

Uh oh!

fbusato Oct 14, 2025

Uh oh!

bernhardmgruber commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

copy-pr-bot bot commented Oct 14, 2025

Uh oh!

bernhardmgruber commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Drop experimental TMA exposure in cuda::barrier #6225

Are you sure you want to change the base?

Drop experimental TMA exposure in cuda::barrier #6225

Uh oh!

Conversation

bernhardmgruber commented Oct 14, 2025

Uh oh!

bernhardmgruber Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

fbusato Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

🥳 CI Workflow Results

🟩 Finished in 2h 48m: Pass: 100%/84 | Total: 2d 00h | Max: 2h 47m | Hits: 81%/208367

Uh oh!

copy-pr-bot bot commented Oct 14, 2025

Uh oh!

bernhardmgruber commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants