Skip to content

sycl: simplify bin_bcast_kernel #13383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AD2605
Copy link
Contributor

@AD2605 AD2605 commented May 8, 2025

This PR simplifies the bin-bcast kernel, and adds a special code-path when the all the inputs are contiguous, thus avoiding the need for unnecessary index calculations.

The current bin-bcast launches a 3D grid or a 1D grid, and the former often limiting in the number of workitems it can accomodate.

This PR completely flattens the kernel, which also makes it easier to check for contiguous memory accesses, and the separate contiguous path also opens the possibility of vectorization later on, though in my current testing, it did not prove to bring about meaningful difference to performance.

This PR also bring minor but consistent improvement of around 1 tk/s on some models.

Performance compared with the following parameters (with -mmp 0 -ngl 99 -t 8)

Intel Lunar Lake 140V iGPU

Model This PR (5a0e7a9 ) tk/s Master (814f795) tk / s
qwen2 1.5B Q4_0 34.05 ± 0.53 33.39 ± 0.52
gemma2 2B Q4_K 25.00 ± 0.25 24.74 ± 0.21
llama 8B Q4_K - Medium 11.10 ± 1.27 10.2 ± 0.70

Intel Data Center Max 1100

Model This PR (5a0e7a9 ) tk/s Master (814f795) tk / s
qwen2 1.5B Q4_0 95.06 ± 0.76 92.51 ± 5.88
gemma2 2B Q4_K 84.62 ± 0.18 82.44 ± 0.22
llama 8B Q4_K - Medium 36.72 ± 0.05 36.35 ± 0.07

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant