You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, I am doing continuous integration for fla.
I'm experiencing substantially slower compilation times when using Triton with Intel Arc A770 compared to NVIDIA RTX 4090 (10x+ difference in some kernel's end-to-end compilation). This significantly impacts development workflow and CI testing efficiency.
Reproduction Steps:
Environment: Intel Arc A770 (latest drivers) vs. NVIDIA RTX 4090
A770 lacks hardware features for MMA that newer hardware has (PVC, LNL, BMG). I suspect the compilation time is coming from inefficient (large) kernels. This is a known issue and, unfortunately, I'm not sure this is something we intend to improve. Do you have access to a BMG or even a PVC machine? Intel developer cloud could be a good place to start.
vlad-penkin
changed the title
Significant Compilation Time Discrepancy: Triton XPU (A770) vs. NVIDIA (RTX 4090)
[flash-linear-attention] Significant Compilation Time Discrepancy: Triton XPU (A770) vs. NVIDIA (RTX 4090)
Mar 26, 2025
A770 lacks hardware features for MMA that newer hardware has (PVC, LNL, BMG). I suspect the compilation time is coming from inefficient (large) kernels. This is a known issue and, unfortunately, I'm not sure this is something we intend to improve. Do you have access to a BMG or even a PVC machine? Intel developer cloud could be a good place to start.
Describe the issue
Currently, I am doing continuous integration for fla.
I'm experiencing substantially slower compilation times when using Triton with Intel Arc A770 compared to NVIDIA RTX 4090 (10x+ difference in some kernel's end-to-end compilation). This significantly impacts development workflow and CI testing efficiency.
Reproduction Steps:
git clone https://github.com/fla-org/flash-linear-attention/
Observed Behavior:
Expected Behavior:
Impact:
Request:
Could the Triton team:
Hey, could the Triton team:
Environment details
pip list |grep triton
pytorch-triton-xpu 3.3.0
Intel Arc A770
The text was updated successfully, but these errors were encountered: