You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 28, 2026. It is now read-only.
Multi-GPU inference using the SYCL / oneAPI backend fails with an out-of-device-memory error during a SYCL memcpy().wait() call, even though sufficient VRAM is available on each GPU. The same model and configuration work reliably on a single Intel GPU.
This appears to be a multi-GPU SYCL / Level Zero pipeline or cross-device copy issue, not a real VRAM exhaustion problem.
Model uses <3 GiB weights + ~512 MiB KV cache
The failure happens after successful model load, during inference
The error is triggered inside a SYCL memcpy + wait, suggesting:
cross-device tensor movement
pipeline parallelism
or Level Zero memory management issues
With two GPUs visible, logs show: