why did i need to apply this patch #24934

rajb245 · 2025-06-03T00:00:51Z

rajb245
Jun 3, 2025

I got some compiler errors about deprecation that suggested some typenames were deprecated and it provided updated non-deprecated typenames right there in the warning message (warning message that generated a compiler error?). I blindly did what the error said (patch below). This is CUDA 12.9 on Ubuntu 24.04 with main as of right now. I'm not a developer here but this did fix my build so I wanted to share anyway to see if there's some next step I can take. Maybe this needs to be conditionally done when building against some versions of the CUDA API or something? If someone lets me know how I can provide more info or what might pass for an MR, I'd be happy to help. Thanks.

diff --git a/onnxruntime/contrib_ops/cuda/bert/embed_layer_norm_impl.cu b/onnxruntime/contrib_ops/cuda/bert/embed_layer_norm_impl.cu
index 8a17e945df..0b49124e05 100644
--- a/onnxruntime/contrib_ops/cuda/bert/embed_layer_norm_impl.cu
+++ b/onnxruntime/contrib_ops/cuda/bert/embed_layer_norm_impl.cu
@@ -39,7 +39,7 @@ __global__ void MaskIndexKernelSmall(int sequence_length, const int* mask, int*
   // blockIdx.x is b
   const int offset = blockIdx.x * sequence_length;  // batch strides of sequence_length

-  cub::Min min;
+  ::cuda::minimum min;
   int thread_data(sequence_length);

   const int idx = offset + threadIdx.x;
@@ -66,7 +66,7 @@ __global__ void MaskIndexKernel(int sequence_length, const int* mask, int* mask_
   // blockIdx.x is b
   const int offset = blockIdx.x * sequence_length;  // batch strides of sequence_length

-  cub::Min min;
+  ::cuda::minimum min;
   int thread_data(sequence_length);

   for (int i = threadIdx.x; i < sequence_length; i += TPB) {
diff --git a/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_kernel.cu b/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_kernel.cu
index 3e0b9d35b1..4236254c87 100644
--- a/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_kernel.cu
+++ b/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_kernel.cu
@@ -65,7 +65,7 @@ __launch_bounds__(TPB) __global__

     const int thread_row_offset = blockIdx.x * num_cols;

-    cub::Sum sum;
+    ::cuda::std::plus sum;
     float threadData(-FLT_MAX);

     // Don't touch finished rows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why did i need to apply this patch #24934

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

why did i need to apply this patch #24934

Uh oh!

rajb245 Jun 3, 2025

Replies: 0 comments

rajb245
Jun 3, 2025