Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp on SIMD.Vec triggers LLVM error #679

Open
pxl-th opened this issue Mar 21, 2025 · 1 comment
Open

exp on SIMD.Vec triggers LLVM error #679

pxl-th opened this issue Mar 21, 2025 · 1 comment
Labels
bug Something isn't working ptx Stuff about the NVIDIA PTX back-end.

Comments

@pxl-th
Copy link
Member

pxl-th commented Mar 21, 2025

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA
using Adapt
using SIMD
using KernelAbstractions

@inline function vload(::Type{SIMD.Vec{N, T}}, ptr::Core.LLVMPtr{T, AS}) where {N, T, AS}
    alignment = sizeof(T) * N
    vec_ptr = Base.bitcast(Core.LLVMPtr{SIMD.Vec{N, T}, AS}, ptr)
    return unsafe_load(vec_ptr, 1, Val(alignment))
end

@inline function vstore!(ptr::Core.LLVMPtr{T, AS}, x::SIMD.Vec{N, T}) where {N, T, AS}
    alignment = sizeof(T) * N
    vec_ptr = Base.bitcast(Core.LLVMPtr{SIMD.Vec{N, T}, AS}, ptr)
    unsafe_store!(vec_ptr, x, 1, Val(alignment))
    return
end

@kernel function ker!(y, x)
    i = @index(Global)
    v = vload(SIMD.Vec{4, Float32}, pointer(x))
    v = exp(v)
    vstore!(pointer(y), v)
end

function tt(kab)
    x = Adapt.adapt(kab, ones(Float32, 4))
    y = Adapt.adapt(kab, ones(Float32, 4))
    ker!(kab)(y, x; ndrange=1)
    @show y
    return
end
tt(CUDABackend())

Error:

ERROR: LLVM error: Undefined external symbol "expf"
Stacktrace:
  [1] handle_error(reason::Cstring)
    @ LLVM ~/.julia/packages/LLVM/b3kFs/src/core/context.jl:194
  [2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{Cstring}, OutMemBuf::Base.RefValue{Ptr{…}})
    @ LLVM.API ~/.julia/packages/LLVM/b3kFs/lib/16/libLLVM.jl:11138
  [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
    @ LLVM ~/.julia/packages/LLVM/b3kFs/src/targetmachine.jl:118
  [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/mcgen.jl:75
  [5] mcgen(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/src/compiler/compilation.jl:127
  [6] macro expansion
    @ ~/.julia/packages/GPUCompiler/OGnEB/src/driver.jl:400 [inlined]
  [7] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/utils.jl:110
  [8] emit_asm
    @ ~/.julia/packages/GPUCompiler/OGnEB/src/utils.jl:108 [inlined]
  [9] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OGnEB/src/driver.jl:120
...
@pxl-th pxl-th added the bug Something isn't working label Mar 21, 2025
@maleadt maleadt transferred this issue from JuliaGPU/CUDA.jl Mar 24, 2025
@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

Generated LLVM IR:

; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <4 x float> @llvm.exp.v4f32(<4 x float>) #0

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #0

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0

define ptx_kernel void @_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S8_S8_EE13CuDeviceArrayI7Float32Li1ELi1EESE_({ i64, i32 } %state, { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, { i8 addrspace(1)*, i64, [1 x i64], i64 } %2) local_unnamed_addr !dbg !9 {
conversion:
  %.fca.0.0.0.0.extract = extractvalue { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] } %0, 0, 0, 0, 0
  %.fca.1.1.0.0.0.extract = extractvalue { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] } %0, 1, 1, 0, 0, 0
  %3 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x(), !dbg !13, !range !32
  %4 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !dbg !33, !range !40
  %5 = add nuw nsw i32 %4, 1, !dbg !41
  %6 = zext i32 %5 to i64, !dbg !44
  %7 = zext i32 %3 to i64, !dbg !65
  %8 = mul i64 %.fca.1.1.0.0.0.extract, %7, !dbg !73
  %9 = add i64 %8, %6, !dbg !75
  %10 = icmp slt i64 %9, 1, !dbg !76
  %11 = icmp sgt i64 %9, %.fca.0.0.0.0.extract, !dbg !76
  %.not16 = or i1 %10, %11, !dbg !28
  br i1 %.not16, label %L246, label %L113, !dbg !28

L113:                                             ; preds = %conversion
  %.fca.0.extract4 = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0
  %12 = bitcast i8 addrspace(1)* %.fca.0.extract4 to <4 x float> addrspace(1)*, !dbg !88
  %.fca.0.extract2 = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %2, 0
  %13 = bitcast i8 addrspace(1)* %.fca.0.extract2 to <4 x float> addrspace(1)*, !dbg !103
  %.unpack = load <4 x float>, <4 x float> addrspace(1)* %13, align 16, !dbg !103, !tbaa !113
  %14 = call <4 x float> @llvm.exp.v4f32(<4 x float> %.unpack), !dbg !116
  store <4 x float> %14, <4 x float> addrspace(1)* %12, align 16, !dbg !88, !tbaa !113
  br label %L246, !dbg !88

L246:                                             ; preds = %L113, %conversion
  ret void, !dbg !125
}

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2, !4, !5, !6}
!julia.kernel = !{!7}
!nvvm.annotations = !{!8}

!0 = !{i32 2, !"Dwarf Version", i32 2}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!3 = !DIFile(filename: "julia", directory: ".")
!4 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!5 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!6 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!7 = !{void ({ i64, i32 }, { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] }, { i8 addrspace(1)*, i64, [1 x i64], i64 }, { i8 addrspace(1)*, i64, [1 x i64], i64 })* @_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S8_S8_EE13CuDeviceArrayI7Float32Li1ELi1EESE_}
!8 = !{void ({ i64, i32 }, { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] }, { i8 addrspace(1)*, i64, [1 x i64], i64 }, { i8 addrspace(1)*, i64, [1 x i64], i64 })* @_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S8_S8_EE13CuDeviceArrayI7Float32Li1ELi1EESE_, !"kernel", i32 1}
!9 = distinct !DISubprogram(name: "gpu_ker!", linkageName: "julia_gpu_ker!_25692", scope: null, file: !10, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!10 = !DIFile(filename: "none", directory: ".")
!11 = !DISubroutineType(types: !12)
!12 = !{}
!13 = !DILocation(line: 39, scope: !14, inlinedAt: !16)
!14 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !15, file: !15, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!15 = !DIFile(filename: "/home/tim/Julia/pkg/LLVM/src/interop/base.jl", directory: ".")
!16 = !DILocation(line: 7, scope: !17, inlinedAt: !19)
!17 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !18, file: !18, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!18 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/device/intrinsics/indexing.jl", directory: ".")
!19 = !DILocation(line: 7, scope: !20, inlinedAt: !21)
!20 = distinct !DISubprogram(name: "_index;", linkageName: "_index", scope: !18, file: !18, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!21 = !DILocation(line: 56, scope: !22, inlinedAt: !23)
!22 = distinct !DISubprogram(name: "blockIdx_x;", linkageName: "blockIdx_x", scope: !18, file: !18, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!23 = !DILocation(line: 78, scope: !24, inlinedAt: !25)
!24 = distinct !DISubprogram(name: "#blockIdx;", linkageName: "#blockIdx", scope: !18, file: !18, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!25 = !DILocation(line: 168, scope: !26, inlinedAt: !28)
!26 = distinct !DISubprogram(name: "#__validindex;", linkageName: "#__validindex", scope: !27, file: !27, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!27 = !DIFile(filename: "/home/tim/Julia/pkg/CUDA/src/CUDAKernels.jl", directory: ".")
!28 = !DILocation(line: 96, scope: !29, inlinedAt: !31)
!29 = distinct !DISubprogram(name: "gpu_ker!;", linkageName: "gpu_ker!", scope: !30, file: !30, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!30 = !DIFile(filename: "/home/tim/Julia/pkg/KernelAbstractions/src/macros.jl", directory: ".")
!31 = !DILocation(line: 0, scope: !9)
!32 = !{i32 0, i32 2147483646}
!33 = !DILocation(line: 39, scope: !14, inlinedAt: !34)
!34 = !DILocation(line: 7, scope: !17, inlinedAt: !35)
!35 = !DILocation(line: 7, scope: !20, inlinedAt: !36)
!36 = !DILocation(line: 46, scope: !37, inlinedAt: !38)
!37 = distinct !DISubprogram(name: "threadIdx_x;", linkageName: "threadIdx_x", scope: !18, file: !18, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!38 = !DILocation(line: 92, scope: !39, inlinedAt: !25)
!39 = distinct !DISubprogram(name: "#threadIdx;", linkageName: "#threadIdx", scope: !18, file: !18, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!40 = !{i32 0, i32 1023}
!41 = !DILocation(line: 87, scope: !42, inlinedAt: !36)
!42 = distinct !DISubprogram(name: "+;", linkageName: "+", scope: !43, file: !43, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!43 = !DIFile(filename: "int.jl", directory: ".")
!44 = !DILocation(line: 811, scope: !45, inlinedAt: !47)
!45 = distinct !DISubprogram(name: "toInt64;", linkageName: "toInt64", scope: !46, file: !46, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!46 = !DIFile(filename: "boot.jl", directory: ".")
!47 = !DILocation(line: 892, scope: !48, inlinedAt: !49)
!48 = distinct !DISubprogram(name: "Int64;", linkageName: "Int64", scope: !46, file: !46, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!49 = !DILocation(line: 7, scope: !50, inlinedAt: !52)
!50 = distinct !DISubprogram(name: "convert;", linkageName: "convert", scope: !51, file: !51, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!51 = !DIFile(filename: "number.jl", directory: ".")
!52 = !DILocation(line: 307, scope: !53, inlinedAt: !55)
!53 = distinct !DISubprogram(name: "to_index;", linkageName: "to_index", scope: !54, file: !54, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!54 = !DIFile(filename: "indices.jl", directory: ".")
!55 = !DILocation(line: 292, scope: !53, inlinedAt: !56)
!56 = !DILocation(line: 368, scope: !57, inlinedAt: !58)
!57 = distinct !DISubprogram(name: "to_indices;", linkageName: "to_indices", scope: !54, file: !54, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!58 = !DILocation(line: 365, scope: !57, inlinedAt: !59)
!59 = !DILocation(line: 1312, scope: !60, inlinedAt: !62)
!60 = distinct !DISubprogram(name: "getindex;", linkageName: "getindex", scope: !61, file: !61, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!61 = !DIFile(filename: "abstractarray.jl", directory: ".")
!62 = !DILocation(line: 84, scope: !63, inlinedAt: !25)
!63 = distinct !DISubprogram(name: "expand;", linkageName: "expand", scope: !64, file: !64, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!64 = !DIFile(filename: "/home/tim/Julia/pkg/KernelAbstractions/src/nditeration.jl", directory: ".")
!65 = !DILocation(line: 86, scope: !66, inlinedAt: !67)
!66 = distinct !DISubprogram(name: "-;", linkageName: "-", scope: !43, file: !43, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!67 = !DILocation(line: 78, scope: !68, inlinedAt: !69)
!68 = distinct !DISubprogram(name: "#1;", linkageName: "#1", scope: !64, file: !64, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!69 = !DILocation(line: 48, scope: !70, inlinedAt: !72)
!70 = distinct !DISubprogram(name: "ntuple;", linkageName: "ntuple", scope: !71, file: !71, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!71 = !DIFile(filename: "ntuple.jl", directory: ".")
!72 = !DILocation(line: 74, scope: !63, inlinedAt: !62)
!73 = !DILocation(line: 88, scope: !74, inlinedAt: !67)
!74 = distinct !DISubprogram(name: "*;", linkageName: "*", scope: !43, file: !43, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!75 = !DILocation(line: 87, scope: !42, inlinedAt: !67)
!76 = !DILocation(line: 514, scope: !77, inlinedAt: !78)
!77 = distinct !DISubprogram(name: "<=;", linkageName: "<=", scope: !43, file: !43, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!78 = !DILocation(line: 1426, scope: !79, inlinedAt: !81)
!79 = distinct !DISubprogram(name: "in;", linkageName: "in", scope: !80, file: !80, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!80 = !DIFile(filename: "range.jl", directory: ".")
!81 = !DILocation(line: 382, scope: !82, inlinedAt: !84)
!82 = distinct !DISubprogram(name: "map;", linkageName: "map", scope: !83, file: !83, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!83 = !DIFile(filename: "tuple.jl", directory: ".")
!84 = !DILocation(line: 477, scope: !85, inlinedAt: !87)
!85 = distinct !DISubprogram(name: "in;", linkageName: "in", scope: !86, file: !86, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!86 = !DIFile(filename: "multidimensional.jl", directory: ".")
!87 = !DILocation(line: 169, scope: !26, inlinedAt: !28)
!88 = !DILocation(line: 39, scope: !14, inlinedAt: !89)
!89 = !DILocation(line: 0, scope: !90, inlinedAt: !91)
!90 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !10, file: !10, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!91 = !DILocation(line: 0, scope: !92, inlinedAt: !93)
!92 = distinct !DISubprogram(name: "pointerset;", linkageName: "pointerset", scope: !10, file: !10, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!93 = !DILocation(line: 88, scope: !94, inlinedAt: !96)
!94 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !95, file: !95, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!95 = !DIFile(filename: "/home/tim/Julia/pkg/LLVM/src/interop/pointer.jl", directory: ".")
!96 = !DILocation(line: 4, scope: !97, inlinedAt: !99)
!97 = distinct !DISubprogram(name: "vstore!;", linkageName: "vstore!", scope: !98, file: !98, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!98 = !DIFile(filename: "REPL[15]", directory: ".")
!99 = !DILocation(line: 5, scope: !100, inlinedAt: !102)
!100 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !101, file: !101, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!101 = !DIFile(filename: "REPL[16]", directory: ".")
!102 = !DILocation(line: 97, scope: !29, inlinedAt: !31)
!103 = !DILocation(line: 39, scope: !14, inlinedAt: !104)
!104 = !DILocation(line: 0, scope: !90, inlinedAt: !105)
!105 = !DILocation(line: 0, scope: !106, inlinedAt: !107)
!106 = distinct !DISubprogram(name: "pointerref;", linkageName: "pointerref", scope: !10, file: !10, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!107 = !DILocation(line: 85, scope: !108, inlinedAt: !109)
!108 = distinct !DISubprogram(name: "unsafe_load;", linkageName: "unsafe_load", scope: !95, file: !95, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!109 = !DILocation(line: 4, scope: !110, inlinedAt: !112)
!110 = distinct !DISubprogram(name: "vload;", linkageName: "vload", scope: !111, file: !111, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!111 = !DIFile(filename: "REPL[14]", directory: ".")
!112 = !DILocation(line: 3, scope: !100, inlinedAt: !102)
!113 = !{!114, !114, i64 0, i64 0}
!114 = !{!"custom_tbaa_addrspace(1)", !115, i64 0}
!115 = !{!"custom_tbaa"}
!116 = !DILocation(line: 131, scope: !117, inlinedAt: !119)
!117 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !118, file: !118, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!118 = !DIFile(filename: "/home/tim/.julia/packages/SIMD/hyXY3/src/LLVM_intrinsics.jl", directory: ".")
!119 = !DILocation(line: 127, scope: !120, inlinedAt: !121)
!120 = distinct !DISubprogram(name: "exp;", linkageName: "exp", scope: !118, file: !118, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!121 = !DILocation(line: 156, scope: !122, inlinedAt: !124)
!122 = distinct !DISubprogram(name: "exp;", linkageName: "exp", scope: !123, file: !123, type: !11, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !12)
!123 = !DIFile(filename: "/home/tim/.julia/packages/SIMD/hyXY3/src/simdvec.jl", directory: ".")
!124 = !DILocation(line: 4, scope: !100, inlinedAt: !102)
!125 = !DILocation(line: 99, scope: !29, inlinedAt: !31)
❯ llc /tmp/test.ll
error: no libcall available for fexp

The @llvm.exp.v4f32 call is generated directly (i.e. using llvmcall) by SIMD.jl. Not all back-ends (here, NVPTX) support all combinations of these intrinsics, which is explicitly documented in the LLVM language reference:

You can use llvm.exp on any floating-point or vector of floating-point type. Not all targets support all types however.

... which makes them pretty annoying, as you can't rely on them.

@maleadt maleadt added the ptx Stuff about the NVIDIA PTX back-end. label Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ptx Stuff about the NVIDIA PTX back-end.
Projects
None yet
Development

No branches or pull requests

2 participants