You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running imatrix.cpp (latest llama.cpp) with --ubatch-size 8192, but am getting CUDA errors. My suspicion is CUDA needs arguments < INT_MAX (2^31-1), but large physical batch sizes causes CUDA launch errors for MoEs. --ubatch-size 8191 works fine. 8192 does not.
Long form:
I'm running imatrix.cpp with large physical batch sizes (8192), but sadly I get errors with:
CUDA error: invalid configuration argument
current device: 0, infunctionggml_cuda_mul_mat_id at llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2062
cudaGetLastError()
Using --ubatch-size 8192 causes the error to occur on Qwen 3 30B MoE.
--ubatch-size 8191 works fine.
My suspicion is because CUDA I think requires arguments to be < INT_MAX It's because Qwen has 128 experts, 2048 in dim, so 8192 * 2048 * 128 = 2147483648 > 2147483647 (INT_MAX).
8191 * 2048 * 128 = 2147221504, so less than INT_MAX.
Tagging @JohannesGaessler for visibility!
TLDR:
I'm running imatrix.cpp (latest llama.cpp) with
--ubatch-size 8192
, but am getting CUDA errors. My suspicion is CUDA needs arguments < INT_MAX (2^31-1), but large physical batch sizes causes CUDA launch errors for MoEs.--ubatch-size 8191
works fine. 8192 does not.Long form:
I'm running imatrix.cpp with large physical batch sizes (8192), but sadly I get errors with:
ie the error is here:
Using
--ubatch-size 8192
causes the error to occur on Qwen 3 30B MoE.--ubatch-size 8191
works fine.My suspicion is because CUDA I think requires arguments to be
< INT_MAX
It's because Qwen has 128 experts, 2048 in dim, so8192 * 2048 * 128 = 2147483648 > 2147483647 (INT_MAX)
.8191 * 2048 * 128 = 2147221504, so less than INT_MAX
.Ie one of the arguments:
is exceeding
INT_MAX
, thus causing CUDA to error out.The text was updated successfully, but these errors were encountered: