Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplication of some types with Rational causes compiler error #570

Open
alecloudenback opened this issue Mar 21, 2025 · 3 comments
Open
Labels
kernels Things about kernels and how they are compiled.

Comments

@alecloudenback
Copy link

alecloudenback commented Mar 21, 2025

This seems to be separate from #550 since this is a compiler rather than type error?

When multiplying integers or rationals with a Rational, the function fails to compile.

julia> using Metal

julia> map(y -> y * 3 // 2, MtlArray([1f0]))
1-element MtlVector{Float32, Metal.PrivateStorage}:
 1.5

julia> map(y -> y * 3 // 2, MtlArray([1]))

ERROR: Compilation to native code failed; see below for details.
If you think this is a bug, please file an issue and attach the following files:
- /var/folders/hw/bycsc7f52zvfzlh61rydj85r0000gn/T/jl_NQBgVejWPY.ll
- /var/folders/hw/bycsc7f52zvfzlh61rydj85r0000gn/T/jl_jIh6FyNozc.air
- /var/folders/hw/bycsc7f52zvfzlh61rydj85r0000gn/T/jl_FgjsvGhG72.metallib
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/compilation.jl:205 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ObjectiveC/TgrW6/src/os.jl:264 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/compilation.jl:182 [inlined]
  [5] (::Metal.var"#173#174"{GPUCompiler.CompilerJob{…}, @NamedTuple{…}})()
    @ Metal ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:644
  [6] macro expansion
    @ ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:572 [inlined]
  [7] macro expansion
    @ ./lock.jl:273 [inlined]
  [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#173#174"{GPUCompiler.CompilerJob{…}, @NamedTuple{…}})
    @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:564
  [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{…})
    @ Metal ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:643
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/execution.jl:262
 [11] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/execution.jl:151
 [12] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:189 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] mtlfunction(f::Metal.var"#broadcast_linear#204", tt::Type{Tuple{…}}; name::Nothing, kwargs::@Kwargs{})
    @ Metal ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:184
 [15] mtlfunction
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:182 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:85 [inlined]
 [17] _copyto!
    @ ~/.julia/packages/Metal/N2ABH/src/broadcast.jl:95 [inlined]
 [18] copyto!
    @ ~/.julia/packages/Metal/N2ABH/src/broadcast.jl:47 [inlined]
 [19] copy(bc::Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{…}, Tuple{…}, var"#9#10", Tuple{…}})
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/broadcast.jl:29
 [20] materialize
    @ ./broadcast.jl:872 [inlined]
 [21] map(f::Function, xs::MtlVector{Int64, Metal.PrivateStorage})
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/broadcast.jl:88
 [22] top-level scope
    @ REPL[9]:1
 [23] top-level scope
    @ ~/.julia/packages/Metal/N2ABH/src/initialization.jl:79

caused by: NSError: Compiler encountered an internal error (AGXMetalG16X, code 3)
Stacktrace:
  [1] Metal.MTL.MTLComputePipelineState(dev::Metal.MTL.MTLDeviceInstance, fun::Metal.MTL.MTLFunctionInstance)
    @ Metal.MTL ~/.julia/packages/Metal/N2ABH/lib/mtl/compute_pipeline.jl:60
  [2] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/compilation.jl:187 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ObjectiveC/TgrW6/src/os.jl:264 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/compilation.jl:182 [inlined]
  [5] (::Metal.var"#173#174"{GPUCompiler.CompilerJob{…}, @NamedTuple{…}})()
    @ Metal ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:644
  [6] macro expansion
    @ ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:572 [inlined]
  [7] macro expansion
    @ ./lock.jl:273 [inlined]
  [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#173#174"{GPUCompiler.CompilerJob{…}, @NamedTuple{…}})
    @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:564
  [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{…})
    @ Metal ~/.julia/packages/ObjectiveC/TgrW6/src/foundation.jl:643
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/execution.jl:262
 [11] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/execution.jl:151
 [12] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:189 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] mtlfunction(f::Metal.var"#broadcast_linear#204", tt::Type{Tuple{…}}; name::Nothing, kwargs::@Kwargs{})
    @ Metal ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:184
 [15] mtlfunction
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:182 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/Metal/N2ABH/src/compiler/execution.jl:85 [inlined]
 [17] _copyto!
    @ ~/.julia/packages/Metal/N2ABH/src/broadcast.jl:95 [inlined]
 [18] copyto!
    @ ~/.julia/packages/Metal/N2ABH/src/broadcast.jl:47 [inlined]
 [19] copy(bc::Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{…}, Tuple{…}, var"#9#10", Tuple{…}})
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/broadcast.jl:29
 [20] materialize
    @ ./broadcast.jl:872 [inlined]
 [21] map(f::Function, xs::MtlVector{Int64, Metal.PrivateStorage})
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/broadcast.jl:88
 [22] top-level scope
    @ REPL[9]:1
 [23] top-level scope
    @ ~/.julia/packages/Metal/N2ABH/src/initialization.jl:79
Some type information was truncated. Use `show(err)` to see complete types.

Here's the additional files the error message asks for:
https://drive.google.com/drive/folders/1ipHy4p6DEiKgCJWbTHtqcf1vGKNzQxqq?usp=sharing

julia> Metal.versioninfo()
macOS 15.3.1, Darwin 24.3.0

Toolchain:
- Julia: 1.11.4
- LLVM: 16.0.6

Julia packages:
- Metal.jl: 1.5.1
- GPUArrays: 11.2.2
- GPUCompiler: 1.2.0
- KernelAbstractions: 0.9.34
- ObjectiveC: 3.4.1
- LLVM: 9.2.0
- LLVMDowngrader_jll: 0.6.0+0

1 device:
- Apple M4 Max (464.000 KiB allocated)
@christiangnrd
Copy link
Member

I'll defer to someone more familiar with the compiler stuff on if a fix is possible, but this seems similar to #287.

Can you use Rational{Int32} instead of Rational{Int64} as a workaround?

@christiangnrd christiangnrd added the kernels Things about kernels and how they are compiled. label Mar 22, 2025
@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

I'll defer to someone more familiar with the compiler stuff on if a fix is possible, but this seems similar to #287.

Unlikely, since we already detect those in GPUCompiler: https://github.com/JuliaGPU/GPUCompiler.jl/blob/1de83c11f8acc572ea2e563fe69a458fb34c70ec/src/metal.jl#L119-L120

@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

Back-end error:

LLVM ERROR: unable to legalize instruction: %386:_(s64), %387:_(s1) = 146 %385:_, %180:_
Context:
%386:_(s64), %387:_(s1) = 146 %385:_, %180:_
%385:_(s64) = 67 %384:_, %383:_
%180:_(s64) = 120 i64 3
%384:_(s64) = 45 %382:_, %383:_
%383:_(s64) = 124 %560:_(s32)
%382:_(s64) = 69 %381:_(s64), %bb.19, %354:_(s64), %bb.17
%560:_(s32) = 129 %558:_, %559:_(s32)
%381:_(s64) = 66 %378:_, %380:_
%354:_(s64) = 136 %352:_(s1), %47:_, %281:_
%557:_(s32), %558:_(s32) = 74 %255:_(s64)
%559:_(s32) = 120 i32 31
%378:_(s64) = 69 %47:_(s64), %bb.22, %373:_(s64), %bb.18
%380:_(s64) = 127 %379:_, %149:_(s64)
%352:_(s1) = 51 intpred(ugt), %350:_(s64), %44:_, %544:_, %327:_
%47:_(s64) = 120 i64 0
%281:_(s64) = 199 %255:_
%255:_(s64) = 90 %253:_(p1) :: (load (s64) from %ir.12, !tbaa !141, addrspace 1)
%373:_(s64) = 65 %372:_, %149:_
%379:_(s64) = 69 %359:_(s64), %bb.22, %370:_(s64), %bb.18
%149:_(s64) = 120 i64 1
 (in function: agc.main)

Minimal LLVM IR:

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx15.3.2"

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare { i64, i1 } @llvm.smul.with.overflow.i64(i64, i64) #0

define void @_Z16broadcast_linear14MtlDeviceArrayI8RationalI5Int64ELi1ELi1EE11BroadcastedI13MtlArrayStyleILi1E14PrivateStorageE5TupleI5OneToIS1_EE2_1S8_I8ExtrudedIS_IS1_Li1ELi1EES8_I4BoolES8_IS1_EEEE({ i8 addrspace(1)*, [1 x i64] } addrspace(1)* %0) {
conversion:
  %1 = alloca [0 x [0 x [2 x i64]]], i32 0, align 8
  %2 = getelementptr [0 x [0 x [2 x i64]]], [0 x [0 x [2 x i64]]]* %1, i64 0, i64 0, i64 0, i64 0
  %3 = load i64, i64* %2, align 8
  %4 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %3, i64 %3)
  %5 = extractvalue { i64, i1 } %4, 1
  call void @llvm.assume(i1 %5)
  %6 = extractvalue { i64, i1 } %4, 0
  store i64 %6, i64 addrspace(1)* null, align 4294967296
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite)
declare void @llvm.assume(i1 noundef) #1

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) }

!air.kernel = !{!0}
!air.version = !{!7}

!0 = !{void ({ i8 addrspace(1)*, [1 x i64] } addrspace(1)*, { [1 x { { i8 addrspace(1)*, [1 x i64] }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] } addrspace(1)*, i32, i32)* bitcast (void ({ i8 addrspace(1)*, [1 x i64] } addrspace(1)*)* @_Z16broadcast_linear14MtlDeviceArrayI8RationalI5Int64ELi1ELi1EE11BroadcastedI13MtlArrayStyleILi1E14PrivateStorageE5TupleI5OneToIS1_EE2_1S8_I8ExtrudedIS_IS1_Li1ELi1EES8_I4BoolES8_IS1_EEEE to void ({ i8 addrspace(1)*, [1 x i64] } addrspace(1)*, { [1 x { { i8 addrspace(1)*, [1 x i64] }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] } addrspace(1)*, i32, i32)*), !1, !2}
!1 = !{}
!2 = !{!3, !4, !5, !6}
!3 = !{i32 0, !"air.buffer", !"air.location_index", i32 0, i32 1, !"air.read_write", !"air.address_space", i32 1, !"air.arg_type_size", i32 16, !"air.arg_type_align_size", i32 8, !"air.arg_type_name", !"MtlDeviceVector{Rational{Int64}, 1}", !"air.arg_name", !"dest"}
!4 = !{i32 1, !"air.buffer", !"air.location_index", i32 1, i32 1, !"air.read_write", !"air.address_space", i32 1, !"air.arg_type_size", i32 40, !"air.arg_type_align_size", i32 8, !"air.arg_type_name", !"Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1, Metal.PrivateStorage}, Tuple{Base.OneTo{Int64}}, var\22#1#2\22, Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}", !"air.arg_name", !"bc"}
!5 = !{i32 2, !"air.threads_per_grid", !"air.arg_type_name", !"uint"}
!6 = !{i32 3, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint"}
!7 = !{i32 2, i32 5, i32 0}

So probably @llvm.smul.with.overflow.i64 (presumably with intrinsic ID 146) being unsupported by the back-end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernels Things about kernels and how they are compiled.
Projects
None yet
Development

No branches or pull requests

3 participants