Skip to content

SYCL: Fix test-backend-ops crashes with SYCL-Graph #13357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EwanC
Copy link
Contributor

@EwanC EwanC commented May 7, 2025

Currently on a CUDA backend to SYCL when running
GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 locally I see crashes from 3 operations:

  1. -o MUL_MAT: Issue arising from recording of oneMath ext_codeplay_enqueue_native_command.
  2. -o CONCAT : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187, can these wait calls just be removed?
  3. -o MUL_MAT_ID: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 , host work could be wrapped in a host-task?

For 1) I have come up with a oneMath fix in uxlfoundation/oneMath#669 I've put a provisional git tag to pull in this PR for testing, which is why this PR is draft, but will update to the upstream commit once merged.

For 2 & 3) we've noticed that ggml-cuda.cu has the check_node_graph_compatibility_and_refresh_copy_ops method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to ggml-sycl.cpp for checking if a graph can be used for the operations even if a user has asked for it to be enabled.

Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` I
see crashes from 3 operations:

1) `-o MUL_MAT`: Issue arising from recording of oneMath `ext_codeplay_enqueue_native_command`.
2) `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187, can these wait calls just be removed?
3) `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 , host work could be wrapped in a host-task?

For 1) I have come up with a oneMath fix in uxlfoundation/oneMath#669, I've put a provisional git tag to pull in this PR for testing, but will update to the upstream commit once merged.

For 2 & 3) we've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 7, 2025
Copy link
Collaborator

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Ping again to give the approval once the fix for oneMath gets merged in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants