Skip to content

Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4), possibly alignment ? #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sjdaines opened this issue Dec 16, 2024 · 15 comments

Comments

@sjdaines
Copy link

sjdaines commented Dec 16, 2024

I'm seeing intermittent segfaults with Julia 1.11.2, for code that works fine on Julia 1.10
This is using SIMD v3.7.0 (not tested other versions)

Attempt at a MWE (this only generates intermittent segfaults, the full code always fails with Julia 1.11.2):

julia> import SIMD
julia> rbuf = Ref{SIMD.Vec{8, Float32}}()
Base.RefValue{SIMD.Vec{8, Float32}}(<8 x Float32>[2.2f-44, 0.0, -2.4113148f37, 4.4487f-41, 6.78f-41, 0.0, 0.0, 0.0])
julia> rbuf[]  # segfault!

The behaviour of the MWE above seems to be intermittent: the first few runs with a fresh julia repl generate a segfault, subsequent runs on the same PC then work (whereas the full code always fails with Julia 1.11.2).
(rbuf is uninitialized in the MWE above, the full code of course does initialise the equivalent of rbuf[] and still fails with a segfault)

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 7600 6-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null
@eschnett
Copy link
Owner

I cannot reproduce this on macOS (arm64-apple-darwin24.0.0). The problem might be architecture-specific.

@sjdaines
Copy link
Author

I'm seeing other segfaults as well with SIMD on this PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above, so this doesn't look like it's anything to do with Ref

A possibly simpler MWE, again this sometimes gives a segfault, sometimes OK:

julia> import SIMD

julia> x = zeros(SIMD.Vec{8, Float64}, 10)
10-element Vector{SIMD.Vec{8, Float64}}:

[538808] signal 11 (128): Segmentation fault
in expression starting at none:0
getindex at ./essentials.jl:917 [inlined]
getindex at ./array.jl:930
unknown function (ip: 0x77108445e7b2)
alignment at ./arrayshow.jl:69
_print_matrix at ./arrayshow.jl:207
print_matrix at ./arrayshow.jl:171
print_matrix at ./arrayshow.jl:171 [inlined]
print_array at ./arrayshow.jl:358 [inlined]
show at ./arrayshow.jl:399
unknown function (ip: 0x77108445e476)
#68 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:367
jfptr_YY.68_10048.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:353
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:372 [inlined]
display at ./multimedia.jl:340
jfptr_display_13663.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:409
#70 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:378
jfptr_YY.70_10086.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:376
do_respond at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1003
jfptr_do_respond_10241.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_interface at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2755
jfptr_run_interface_8710.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
run_frontend at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1474
#75 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:480
jfptr_YY.75_10143.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 2140600 (Pool: 2140459; Big: 141); GC: 3
Segmentation fault (core dumped)

@sjdaines sjdaines changed the title Intermittent segfaults with Ref on Julia 1.11.2 Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4) Dec 21, 2024
@sjdaines
Copy link
Author

sjdaines commented Dec 21, 2024

And possibly an even simpler MWE... looks like this is perhaps alignment ?
Same PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above,

Is this just user error ? (should I be using some explicit way to align array allocations ?)

julia> a = Array{SIMD.Vec{8, Float64}}(undef, 2);   # ; to suppress REPL output

julia> Int(pointer(a)) % 64   # if this gives 0, no segfault
32

julia> a   # segfault as soon as try and display
2-element Vector{SIMD.Vec{8, Float64}}:

[539567] signal 11 (128): Segmentation fault
in expression starting at none:0
getindex at ./essentials.jl:917 [inlined]
getindex at ./array.jl:930
unknown function (ip: 0x71531a6b7472)
alignment at ./arrayshow.jl:69
_print_matrix at ./arrayshow.jl:207
print_matrix at ./arrayshow.jl:171
print_matrix at ./arrayshow.jl:171 [inlined]
print_array at ./arrayshow.jl:358 [inlined]
show at ./arrayshow.jl:399
unknown function (ip: 0x71531a6b7206)
#68 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:367
jfptr_YY.68_10048.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:353
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:372 [inlined]
display at ./multimedia.jl:340
jfptr_display_13663.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:409
#70 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:378
jfptr_YY.70_10086.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:376
do_respond at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1003
jfptr_do_respond_10241.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_interface at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2755
jfptr_run_interface_8710.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
run_frontend at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1474
#75 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:480
jfptr_YY.75_10143.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 5568417 (Pool: 5568190; Big: 227); GC: 12
Segmentation fault (core dumped)

@sjdaines
Copy link
Author

Testing the same MWE on same PC, but with Julia 1.10.7 seems to consistently give a 64-byte aligned array, and no segfault:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.7 (2024-11-26)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.10) pkg> activate examples
  Activating project at `~/PALEO/PALEOocean.jl/examples`

julia> import SIMD

julia> a = Array{SIMD.Vec{8, Float64}}(undef, 2);

julia> Int(pointer(a)) % 64
0

julia> versioninfo()
Julia Version 1.10.7
Commit 4976d05258e (2024-11-26 15:57 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 7600 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null

@sjdaines sjdaines changed the title Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4) Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4), possibly alignment ? Dec 21, 2024
@sjdaines
Copy link
Author

Perhaps another clue to alignment issue ? Same PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above, intermittent AssertionError while trying valloc to explicitly align array:

julia> a = SIMD.valloc(SIMD.Vec{8, Float64}, 1, 2);

julia> Int(pointer(parent(a))) % 64
0

julia> Int(pointer(a)) % 64
0

julia> a = SIMD.valloc(SIMD.Vec{8, Float64}, 1, 2);
ERROR: AssertionError: mod(off, sizeof(T)) == 0
Stacktrace:
 [1] valloc(::Type{SIMD.Vec{8, Float64}}, N::Int64, sz::Int64)
   @ SIMD ~/.julia/packages/SIMD/cST3l/src/arrayops.jl:127
 [2] top-level scope
   @ REPL[10]:1

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 7600 6-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null

where the assertion is from

@assert mod(off, sizeof(T)) == 0

@KristofferC
Copy link
Collaborator

There were changes to how arrays were implemented in 1.11. If you can reproduce it consistently, perhaps you could do a bisect to see where it started to fail?

sjdaines added a commit to PALEOtoolkit/PALEOocean.jl that referenced this issue Dec 22, 2024
- use netcdf files for output
- tidy up yaml files and remove old versions
- bugfix for ReactionOceanTransportTMM: workaround SIMD issue, see eschnett/SIMD.jl#135
@sjdaines
Copy link
Author

sjdaines commented Jan 3, 2025

This looks like JuliaLang/julia#56937 to me... ?

I'm not sure about the proposed fix JuliaLang/julia#56938 though, which if I understand it correctly means that Julia will at least be consistent in the sense it no longer overpromises about the alignment it provides, but will not guarantee alignment larger than 16-byte alignment ?

@dzhang314
Copy link

Commenting to say that I'm also seeing frequent segfaults with Vectors of SIMD.jl Vecs on Julia 1.11, and this is preventing me from migrating several of my applications using SIMD.jl + MultiFloats.jl from Julia 1.10 to Julia 1.11. This unfortunately means that Julia 1.11 is completely unusable for me at the moment.

The issue seems to be architecture-specific to AMD/Intel x86 CPUs with the AVX-512 instruction set, which requires 64-byte alignment (see, e.g., https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX_512&text=mm512_load_pd&ig_expand=4006). Is there any planned strategy for making Julia 1.11 work with AVX-512? This seems quite important for a high-performance scientific language.

@vchuravy
Copy link
Collaborator

Which Julia version are you using specifically? If the issue is occurring on 1.11.4, please fill a MWE on the upstream issue tracker.

@dzhang314
Copy link

@vchuravy I'm on 1.11.4 -- I was just testing today after I saw the new version drop. I'll try reducing my code down to an MWE. By "upstream", do you mean filing an issue on https://github.com/JuliaLang/julia directly?

@dzhang314
Copy link

Very simple MWE:

$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.4 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using SIMD

julia> for _ = 1:10; v = Vector{Vec{8,Float64}}(undef, 1); println(v); end
Vec{8, Float64}[<8 x Float64>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
Vec{8, Float64}[
[21720] signal 11 (128): Segmentation fault
in expression starting at REPL[2]:1
getindex at ./essentials.jl:917 [inlined]
show_delim_array at ./show.jl:1397
show_delim_array at ./show.jl:1387 [inlined]
show_vector at ./arrayshow.jl:530
show_vector at ./arrayshow.jl:515 [inlined]
show at ./arrayshow.jl:486 [inlined]
print at ./strings/io.jl:35
print at ./strings/io.jl:46
println at ./strings/io.jl:75
unknown function (ip: 0x73d405815e36)
println at ./coreio.jl:4
top-level scope at ./REPL[2]:1
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10102 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14761 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73560 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x73d406c2a1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 770352 (Pool: 770306; Big: 46); GC: 1
Segmentation fault (core dumped)

I'm on Linux x86-64 on an AMD Ryzen 9 9950X processor (Zen 5 microarchitecture). I'm using 10 trials here to be conservative -- I always get the segfault within the first 3-4 tries. Also, there is nothing special about the undef initializer here. zeros(Vec{8,Float64}, 1) and ones(Vec{8,Float64}, 1) also reliably reproduce the same issue.

@eschnett
Copy link
Owner

I could not reproduce this locally. I tried several architectures (Apple M3 Pro, AMD EPYC 7532, Xeon(R) Gold 6148), but I don't have access to Zen 4.

If someone else can reproduce and debug this then the following is not needed. Otherwise, the following might be useful:

  • Create a function f() that takes no arguments and that crashes when it is called.
  • Use @code_native f() to output the assembly instructions for this function.

This won't work if the function calls other functions and the crash happens in there. If so, try to find a different function that crashes. For example, output (println) might be too complicated – instead, try returning a value from your function.

julia> using SIMD
julia> function f(v)
           s = 0.0
           for n in 1:8
               s += v[1][n]
           end
           return s
       end
julia> @code_native f(Vector{Vec{8,Float64}}(undef, 1))

If you think it's alignment then you can also evaluate this expression a few times:

UInt(pointer(Vector{Vec{8,Float64}}(undef, 1))) % 64

If this outputs a nonzero value then you have a misaligned SIMD vector. This would explain the segfault.

@KristofferC
Copy link
Collaborator

Should be fixed by JuliaLang/julia#56938 but requires some more work to backport this to 1.11 + 1.10, JuliaLang/julia#57713 (comment).

@dzhang314
Copy link

dzhang314 commented Mar 11, 2025

@eschnett Huh, I'm very surprised this issue doesn't reproduce on the Xeon Gold 6148, which is also an AVX-512 capable processor. But on my AMD 9950X (Zen 5), I can show you the problematic aligned load right inside Base.getindex:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.4 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using SIMD

julia> code_native(getindex, (Vector{Vec{8,Float64}}, Int))
	.text
	.file	"getindex"
	.globl	julia_getindex_367              # -- Begin function julia_getindex_367
	.p2align	4, 0x90
	.type	julia_getindex_367,@function
julia_getindex_367:                     # @julia_getindex_367
; Function Signature: getindex(Array{SIMD.Vec{8, Float64}, 1}, Int64)
; ┌ @ essentials.jl:914 within `getindex`
# %bb.0:                                # %top
; │ @ essentials.jl within `getindex`
	#DEBUG_VALUE: getindex:A <- [DW_OP_deref] $rsi
	#DEBUG_VALUE: getindex:i <- $rdx
	#DEBUG_VALUE: getindex:A <- [DW_OP_deref] 0
	push	rbp
	mov	rbp, rsp
	sub	rsp, 16
; │ @ essentials.jl:916 within `getindex`
	lea	rax, [rdx - 1]
	cmp	rax, qword ptr [rsi + 16]
	jae	.LBB0_2
# %bb.1:                                # %L15
; │ @ essentials.jl:917 within `getindex`
	mov	rcx, qword ptr [rsi]
	shl	rax, 6
	vmovaps	zmm0, zmmword ptr [rcx + rax]
	vmovaps	zmmword ptr [rdi], zmm0
	mov	rax, rdi
	add	rsp, 16
	pop	rbp
	vzeroupper
	ret
.LBB0_2:                                # %L12
; │ @ essentials.jl:916 within `getindex`
	mov	qword ptr [rbp - 8], rdx
	movabs	rcx, offset j_throw_boundserror_379
	lea	rax, [rbp - 8]
	mov	rdi, rsi
	mov	rsi, rax
	call	rcx
.Lfunc_end0:
	.size	julia_getindex_367, .Lfunc_end0-julia_getindex_367
; └
                                        # -- End function
	.type	".L+SIMD.Vec#369",@object       # @"+SIMD.Vec#369"
	.section	.rodata,"a",@progbits
	.p2align	3, 0x0
".L+SIMD.Vec#369":
	.quad	".L+SIMD.Vec#369.jit"
	.size	".L+SIMD.Vec#369", 8

.set ".L+SIMD.Vec#369.jit", 128935370732880
	.size	".L+SIMD.Vec#369.jit", 8
	.section	".note.GNU-stack","",@progbits

The problematic instruction is the vmovaps zmm0, zmmword ptr [rcx + rax] right in the middle. You can see that 64-byte alignment is violated for Vector{Vec{8,Float64}}:

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000030

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000010

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000020

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000030

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000000

@dzhang314
Copy link

The story is even stranger than I thought! It turns out that Vecs are also unaligned in Julia v1.10:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.9 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using SIMD

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000020

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000030

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000000

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000010

julia> UInt(pointer(Vector{Vec{8,Float64}}(undef, 5))) % 64
0x0000000000000020

However, Julia v1.10 was smart enough to generate an unaligned load, where Julia v1.11 erroneously generates an aligned load:

julia> code_native(getindex, (Vector{Vec{8,Float64}}, Int))
	[ ... snip ... ]
	vmovups	zmm0, zmmword ptr [rcx + rax]
	vmovaps	zmmword ptr [rdi], zmm0
	[ ... snip ... ]

As discussed in JuliaLang/julia#57713, I don't think there's anything to be done in SIMD.jl to fix or work around this. This issue needs to be fixed upstream in Julia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants