Skip to content

CUDA error: invalid configuration argument for MoEs - --ubatch-size 8192 exceeds INT_MAX #13376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danielhanchen opened this issue May 8, 2025 · 0 comments · Fixed by #13384

Comments

@danielhanchen
Copy link
Contributor

danielhanchen commented May 8, 2025

Tagging @JohannesGaessler for visibility!

TLDR:

I'm running imatrix.cpp (latest llama.cpp) with --ubatch-size 8192, but am getting CUDA errors. My suspicion is CUDA needs arguments < INT_MAX (2^31-1), but large physical batch sizes causes CUDA launch errors for MoEs. --ubatch-size 8191 works fine. 8192 does not.

Long form:

I'm running imatrix.cpp with large physical batch sizes (8192), but sadly I get errors with:

CUDA error: invalid configuration argument
  current device: 0, in function ggml_cuda_mul_mat_id at llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2062
  cudaGetLastError()

ie the error is here:

get_rows_cuda(src1->data, src1->type, ids_to_sorted, src1_sorted.ptr, type_src1_sorted,
        ne10, nb11, nb12, nb13,
        ne_get_rows, 1, 1, sizeof(int32_t), ne_get_rows*sizeof(int32_t), ne_get_rows*sizeof(int32_t),
        ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted, stream);
CUDA_CHECK(cudaGetLastError());

Using --ubatch-size 8192 causes the error to occur on Qwen 3 30B MoE.

--ubatch-size 8191 works fine.

My suspicion is because CUDA I think requires arguments to be < INT_MAX It's because Qwen has 128 experts, 2048 in dim, so 8192 * 2048 * 128 = 2147483648 > 2147483647 (INT_MAX).

8191 * 2048 * 128 = 2147221504, so less than INT_MAX.

Ie one of the arguments:

ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted

is exceeding INT_MAX, thus causing CUDA to error out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant