SYCL: Fix test-backend-ops crashes with SYCL-Graph #13357

EwanC · 2025-05-07T14:24:34Z

Currently on a CUDA backend to SYCL when running
GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 locally I see crashes from 3 operations:

-o MUL_MAT: Issue arising from recording of oneMath ext_codeplay_enqueue_native_command.
-o CONCAT : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
-o MUL_MAT_ID: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074

For 1) I have come up with a oneMath fix in uxlfoundation/oneMath#669 I've put a provisional git tag to pull in this PR for testing, which is why this PR is draft, but will update to the upstream commit once merged.

For 2 & 3) we've noticed that ggml-cuda.cu has the check_node_graph_compatibility_and_refresh_copy_ops method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to ggml-sycl.cpp for checking if a graph can be used for the operations even if a user has asked for it to be enabled.

Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` I see crashes from 3 operations: 1) `-o MUL_MAT`: Issue arising from recording of oneMath `ext_codeplay_enqueue_native_command`. 2) `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187, can these wait calls just be removed? 3) `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 , host work could be wrapped in a host-task? For 1) I have come up with a oneMath fix in uxlfoundation/oneMath#669, I've put a provisional git tag to pull in this PR for testing, but will update to the upstream commit once merged. For 2 & 3) we've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.

Alcpz

LGTM Ping again to give the approval once the fix for oneMath gets merged in

EwanC · 2025-05-16T11:10:07Z

This patch is doing two things by pumping the oneMath commit, which could have other unintended consequences, and adding the skips. Closing this PR and have created #13587 just for the skip part of the change, then will create another PR for the oneMath change when it's ready.

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 7, 2025

Alcpz reviewed May 9, 2025

View reviewed changes

EwanC closed this May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SYCL: Fix test-backend-ops crashes with SYCL-Graph #13357

SYCL: Fix test-backend-ops crashes with SYCL-Graph #13357

Uh oh!

EwanC commented May 7, 2025 •

edited

Loading

Uh oh!

Alcpz left a comment

Uh oh!

EwanC commented May 16, 2025

Uh oh!

Uh oh!

SYCL: Fix test-backend-ops crashes with SYCL-Graph #13357

SYCL: Fix test-backend-ops crashes with SYCL-Graph #13357

Uh oh!

Conversation

EwanC commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

EwanC commented May 16, 2025

Uh oh!

Uh oh!

EwanC commented May 7, 2025 •

edited

Loading