-
Notifications
You must be signed in to change notification settings - Fork 35
Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4), possibly alignment ? #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I cannot reproduce this on macOS (arm64-apple-darwin24.0.0). The problem might be architecture-specific. |
I'm seeing other segfaults as well with SIMD on this PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above, so this doesn't look like it's anything to do with Ref A possibly simpler MWE, again this sometimes gives a segfault, sometimes OK:
|
And possibly an even simpler MWE... looks like this is perhaps alignment ? Is this just user error ? (should I be using some explicit way to align array allocations ?)
|
Testing the same MWE on same PC, but with Julia 1.10.7 seems to consistently give a 64-byte aligned array, and no segfault:
|
Perhaps another clue to alignment issue ? Same PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above, intermittent AssertionError while trying
where the assertion is from Line 127 in 53c9476
|
There were changes to how arrays were implemented in 1.11. If you can reproduce it consistently, perhaps you could do a bisect to see where it started to fail? |
- use netcdf files for output - tidy up yaml files and remove old versions - bugfix for ReactionOceanTransportTMM: workaround SIMD issue, see eschnett/SIMD.jl#135
This looks like JuliaLang/julia#56937 to me... ? I'm not sure about the proposed fix JuliaLang/julia#56938 though, which if I understand it correctly means that Julia will at least be consistent in the sense it no longer overpromises about the alignment it provides, but will not guarantee alignment larger than 16-byte alignment ? |
Commenting to say that I'm also seeing frequent segfaults with The issue seems to be architecture-specific to AMD/Intel x86 CPUs with the AVX-512 instruction set, which requires 64-byte alignment (see, e.g., https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX_512&text=mm512_load_pd&ig_expand=4006). Is there any planned strategy for making Julia 1.11 work with AVX-512? This seems quite important for a high-performance scientific language. |
Which Julia version are you using specifically? If the issue is occurring on 1.11.4, please fill a MWE on the upstream issue tracker. |
@vchuravy I'm on 1.11.4 -- I was just testing today after I saw the new version drop. I'll try reducing my code down to an MWE. By "upstream", do you mean filing an issue on https://github.com/JuliaLang/julia directly? |
Very simple MWE:
I'm on Linux x86-64 on an AMD Ryzen 9 9950X processor (Zen 5 microarchitecture). I'm using 10 trials here to be conservative -- I always get the segfault within the first 3-4 tries. Also, there is nothing special about the |
I could not reproduce this locally. I tried several architectures (Apple M3 Pro, AMD EPYC 7532, Xeon(R) Gold 6148), but I don't have access to Zen 4. If someone else can reproduce and debug this then the following is not needed. Otherwise, the following might be useful:
This won't work if the function calls other functions and the crash happens in there. If so, try to find a different function that crashes. For example, output ( julia> using SIMD
julia> function f(v)
s = 0.0
for n in 1:8
s += v[1][n]
end
return s
end
julia> @code_native f(Vector{Vec{8,Float64}}(undef, 1)) If you think it's alignment then you can also evaluate this expression a few times:
If this outputs a nonzero value then you have a misaligned SIMD vector. This would explain the segfault. |
Should be fixed by JuliaLang/julia#56938 but requires some more work to backport this to 1.11 + 1.10, JuliaLang/julia#57713 (comment). |
@eschnett Huh, I'm very surprised this issue doesn't reproduce on the Xeon Gold 6148, which is also an AVX-512 capable processor. But on my AMD 9950X (Zen 5), I can show you the problematic aligned load right inside
The problematic instruction is the
|
The story is even stranger than I thought! It turns out that
However, Julia v1.10 was smart enough to generate an unaligned load, where Julia v1.11 erroneously generates an aligned load:
As discussed in JuliaLang/julia#57713, I don't think there's anything to be done in SIMD.jl to fix or work around this. This issue needs to be fixed upstream in Julia. |
I'm seeing intermittent segfaults with Julia 1.11.2, for code that works fine on Julia 1.10
This is using SIMD v3.7.0 (not tested other versions)
Attempt at a MWE (this only generates intermittent segfaults, the full code always fails with Julia 1.11.2):
The behaviour of the MWE above seems to be intermittent: the first few runs with a fresh julia repl generate a segfault, subsequent runs on the same PC then work (whereas the full code always fails with Julia 1.11.2).
(rbuf is uninitialized in the MWE above, the full code of course does initialise the equivalent of rbuf[] and still fails with a segfault)
The text was updated successfully, but these errors were encountered: