You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Enable mmf for RDNA3, all mul_mat_f related cases shall pass, still getting the perf data.
There is also perf regression in mul_mat_f on my 7900XTX, I assume it's the similar issue as ROCm/ROCm#5727.
If anyone can help to collect the perf data of MUL_MAT on other RDNA3, that will be very helpful. If there is perf improvement, I will still enable mul_mat_f on RDNA3 and ask ROCm to improve the perf, or I will suggest to disable mul_mat_f on RDNA3.
Add the perf data of ops on windows, windows data is unstable, but this is the only RDNA3 I have. I will be very helpful if anyone can have a test other RDNA3 GPUs on Linux, thank you.
Finally I can get an Ubuntu 22.04 work, just add the data on it with ROCm 7.1.0, unlike my 9070XT, looks like that 7900XTX can get perf improvement on mul_mat_f, this is why I doubt that ROCm compiler doesn't do optimization for RDNA4.
ggmlchanges relating to the ggml tensor library for machine learningNvidia GPUIssues specific to Nvidia GPUs
2 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable mmf for RDNA3, all mul_mat_f related cases shall pass, still getting the perf data.
There is also perf regression in mul_mat_f on my 7900XTX, I assume it's the similar issue as ROCm/ROCm#5727.
If anyone can help to collect the perf data of MUL_MAT on other RDNA3, that will be very helpful. If there is perf improvement, I will still enable mul_mat_f on RDNA3 and ask ROCm to improve the perf, or I will suggest to disable mul_mat_f on RDNA3.
MUL_MAT_ID_FUSION_rdna3_test.txt
MUL_MAT_ID_rdna3_test.txt
MUL_MAT_rdna3_test.txt
MUL_MAT_ID_FUSION_rdna4_test.txt
MUL_MAT_ID_rdna4_test.txt
MUL_MAT_rdna4_test.txt