-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip test_mxfp on A770 #3811
base: main
Are you sure you want to change the base?
Skip test_mxfp on A770 #3811
Conversation
It looks like the problem might be elsewhere:
|
trying again with new approach (not using skiplist): https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14228029469/job/39872117025 |
@@ -316,6 +316,8 @@ def fp8e8m0_to_float32(scale): | |||
@pytest.mark.parametrize("NUM_STAGES", [1, 3]) | |||
@pytest.mark.parametrize("NUM_WARPS", [4, 8]) | |||
@pytest.mark.parametrize("nonKDim", ([0, 16, 32] if is_hip_cdna() else [0])) | |||
@pytest.mark.skipif(is_xpu() and not torch.xpu.get_device_capability()['has_subgroup_matrix_multiply_accumulate'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the case when has_subgroup_matrix_multiply_accumulate
not in capabilities.
@pytest.mark.skipif(is_xpu() and not torch.xpu.get_device_capability()['has_subgroup_matrix_multiply_accumulate'], | |
@pytest.mark.skipif(is_xpu() and not torch.xpu.get_device_capability().get('has_subgroup_matrix_multiply_accumulate', False), |
In general, this decorator is executed in import-time, what is not convenient and Python best practice is minimize import-time logic. IMHO it is better to move this conditional skip to the test body.
Please note this run uses the default skip list: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14228029469/job/39872117025#step:11:38 We do not (yet) autodetect a skip list based on the selected runner, so i would recommend specifying the skip list explicitly. The workflow "Build and test GPU" requires setting "Runner label" and "Skip list" inputs, so it is better suited for such runs. |
ok - what is the syntax for the skiplist parameter? is this documented somewhere? |
It is not documented yet. For "Build and test" the default value is "default", for "Build and test GPU" it is empty by default, so we have to specify it. Basically it is the last directory name in the path to a skip list: https://github.com/intel/intel-xpu-backend-for-triton/tree/main/scripts/skiplist. For example: |
Also a specified skip list can be identified in a successful run, in the step "Print inputs". For example: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14167030129/job/39682308605#step:2:35 |
Specifying a skip list via workflow's input was considered as a temporary measure. The plan was to identify it automatically based on the selected runner. We still want this some day, just not enough capacity at the moment. |
resubmitted: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14231903364/job/39884095474 |
Yes, looks good: correct runner and skip list. The input "device" is also a temporary input (with max1100 as a default value), will be defined by a runner when we implement everything right. |
We could also use the same approach to skip |
Running tests in main on A770 on a runner with |
@alexbaden you can now just specify |
843992c
to
236116c
Compare
A770 run with blanket |
thanks! FYI: pre-commit checks failed |
yup - but the test is already running so let's just let it finish. |
With the skip list from 1cc2bd3 the workflow run took ~2.5h: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14286539636. |
The A770 job recently increased to 5+ hours. See if skipping mxfp brings the job back to taking a reasonable amount of time to complete.