Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code that uses a libcall marks internal symbol as exported when making object file #132055

Closed
gbaraldi opened this issue Mar 19, 2025 · 15 comments · Fixed by #133291
Closed

Code that uses a libcall marks internal symbol as exported when making object file #132055

gbaraldi opened this issue Mar 19, 2025 · 15 comments · Fixed by #133291

Comments

@gbaraldi
Copy link
Contributor

gbaraldi commented Mar 19, 2025

This is showing up as JuliaLang/julia#57658 (comment) which is holding up julia being able to compile with LLVM20.

The issue is that a defining a libcall in a module and then having it be used is making the libcall symbol become external

source_filename = "text"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0-macho"

declare i16 @julia_float_to_half(float)

define internal i16 @__truncsfhf2(float %0) {
  %2 = call i16 @julia_float_to_half(float %0)
  ret i16 %2
}

define hidden swiftcc half @julia_fp(float %0) {
  %2 = fptrunc float %0 to half
  ret half %2
}

@llvm.compiler.used = appending global [1 x ptr] [ptr @__truncsfhf2], section "llvm.metadata"
❯ /Users/gabrielbaraldi/julia9/usr/tools/llc lala.ll -filetype=obj -o llvm20.o
usr/lib/julia on  debug-llvm-20 [?] 
❯ nm llvm20.o
0000000000000000 T ___truncsfhf2
                 U _julia_float_to_half
0000000000000010 T _julia_fp
usr/lib/julia on  debug-llvm-20 [?] 
❯ llc lala.ll -filetype=obj -o llvm19.o
usr/lib/julia on  debug-llvm-20 [?] 
❯ nm llvm19.o
0000000000000000 t ___truncsfhf2
                 U _julia_float_to_half
0000000000000010 T _julia_fp

Not sure if this is an intentional change or not but it's between LLVM 19 and 20

@Zentrik
Copy link
Contributor

Zentrik commented Mar 19, 2025

bisected to 3d0846b with the help of manyclangs

@llvmbot
Copy link
Member

llvmbot commented Mar 19, 2025

@llvm/issue-subscribers-backend-x86

Author: Gabriel Baraldi (gbaraldi)

This is showing up as https://github.com/JuliaLang/julia/pull/57658#issuecomment-2727385403 which is holding up julia being able to compile with LLVM20.

The issue is that a defining a libcall in a module and then having it be used is making the libcall symbol become external

source_filename = "text"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0-macho"

declare i16 @<!-- -->julia_float_to_half(float)

define internal i16 @<!-- -->__truncsfhf2(float %0) {
  %2 = call i16 @<!-- -->julia_float_to_half(float %0)
  ret i16 %2
}

define hidden swiftcc half @<!-- -->julia_fp(float %0) {
  %2 = fptrunc float %0 to half
  ret half %2
}

@<!-- -->llvm.compiler.used = appending global [1 x ptr] [ptr @<!-- -->__truncsfhf2], section "llvm.metadata"
❯ /Users/gabrielbaraldi/julia9/usr/tools/llc lala.ll -filetype=obj -o llvm20.o
usr/lib/julia on  debug-llvm-20 [?] 
❯ nm llvm20.o
0000000000000000 T ___truncsfhf2
                 U _julia_float_to_half
0000000000000010 T _julia_fp
usr/lib/julia on  debug-llvm-20 [?] 
❯ llc lala.ll -filetype=obj -o llvm19.o
usr/lib/julia on  debug-llvm-20 [?] 
❯ nm llvm19.o
0000000000000000 t ___truncsfhf2
                 U _julia_float_to_half
0000000000000010 T _julia_fp

Not sure if this is an intentional change or not but it's between LLVM 19 and 20

@Zentrik
Copy link
Contributor

Zentrik commented Mar 22, 2025

Based on #3045, it seems at least in 2008 it was intentional that calls to libcalls have symbols of type MO_ExternalSymbol, if I'm understanding things correctly. @weiweichen was removing support for an internal libcall definition intentional?'

@weiweichen
Copy link
Contributor

Based on #3045, it seems at least in 2008 it was intentional that calls to libcalls have symbols of type MO_ExternalSymbol, if I'm understanding things correctly. @weiweichen was removing support for an internal libcall definition intentional?'

Not sure I have the context here (or why am I tagged??)

@Zentrik
Copy link
Contributor

Zentrik commented Mar 22, 2025

Sorry, so if you look at the ir in the first comment, it's no longer producing an internal symbol as you can see in the nm output. I bisected the change to 3d0846b, which is a pr you authored, #108880.

I was trying to understand what the point of that pr was and if it was intentional that it had caused the symbol to no longer be internal.

Thanks.

@weiweichen
Copy link
Contributor

weiweichen commented Mar 23, 2025

Sorry, so if you look at the ir in the first comment, it's no longer producing an internal symbol as you can see in the nm output. I bisected the change to 3d0846b, which is a pr you authored, #108880.

I was trying to understand what the point of that pr was and if it was intentional that it had caused the symbol to no longer be internal.

Thanks.

Shouldn’t MO_ExternalSymbol be external (as the name says)? Why would it be internal?
As for the context, I think we were having codegen linking issues for this type of symbol not being marked as external. What is the fundamental issue here for your case with this symbol type difference (the commit has been in for a while (6months ago?), and am a bit surprised with it causing problem till now 🤔 ).

@Zentrik
Copy link
Contributor

Zentrik commented Mar 23, 2025

The Julia Language currently will define __truncsfhf2 in every module (as in IR below) to call our own function and then try to link all the modules together. This fails due to 3d0846b as the symbol is now marked external.

As far as I can tell, a libcall is always represented with a MO_ExternalSymbol even if it's defined in the same module with internal linkage. The only reason this IR isn't producing an internal __truncsfhf2 is because it's a libcall, if it was renamed it would be internal.

source_filename = "text"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0-macho"

declare i16 @julia_float_to_half(float)

define internal i16 @__truncsfhf2(float %0) {
  %2 = call i16 @julia_float_to_half(float %0)
  ret i16 %2
}

define hidden swiftcc half @julia_fp(float %0) {
  %2 = fptrunc float %0 to half
  ret half %2
}

@llvm.compiler.used = appending global [1 x ptr] [ptr @__truncsfhf2], section "llvm.metadata"

(the commit has been in for a while (6months ago?), and am a bit surprised with it causing problem till now 🤔

The Julia language typically only upgrades llvm when a new version comes out which is why it has gone unnoticed so far.

@weiweichen
Copy link
Contributor

weiweichen commented Mar 23, 2025

The Julia Language currently will define __truncsfhf2 in every module (as in IR below) to call our own function and then try to link all the modules together. This fails due to 3d0846b as the symbol is now marked external.

As far as I can tell, a libcall is always represented with a MO_ExternalSymbol even if it's defined in the same module with internal linkage. The only reason this IR isn't producing an internal __truncsfhf2 is because it's a libcall, if it was renamed it would be internal.

source_filename = "text"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0-macho"

declare i16 @julia_float_to_half(float)

define internal i16 @__truncsfhf2(float %0) {
%2 = call i16 @julia_float_to_half(float %0)
ret i16 %2
}

define hidden swiftcc half @julia_fp(float %0) {
%2 = fptrunc float %0 to half
ret half %2
}

@llvm.compiler.used = appending global [1 x ptr] [ptr @__truncsfhf2], section "llvm.metadata"

(the commit has been in for a while (6months ago?), and am a bit surprised with it causing problem till now 🤔

The Julia language typically only upgrades llvm when a new version comes out which is why it has gone unnoticed so far.

Thank you for the details on the failure symptoms! This describes why Julia is currently failing, though it doesn't seem to explain why a MO_ExternalSymbol should be internal which seems contradictory. Since #3045 was authored by Chris, wondering if @lattner has more context/advice on this issue?

(From another train of thought, perhaps less technical)I understand that Julia only pulls upstream every 6months hence the issue just popped, but llvm backend has many downstream users other than Julia, and there hasn't been much issue with this change for them. Could this be a more Julia specific thing that needs to be adjusted on Julia side instead of x86 backend? (CC @npanchen @dgurchenkov)

@npanchen
Copy link
Contributor

@topperc does MO_ExternalSymbol only refer to MCSymbol with external linkage or to any linkage type ? It seems that even on RISC-V call to GetExternalSymbolSymbol may create (assuming MCSymbol doesn't exists) with a private linkage, right ?

@weiweichen
Copy link
Contributor

@topperc does MO_ExternalSymbol only refer to MCSymbol with external linkage or to any linkage type ? It seems that even on RISC-V call to GetExternalSymbolSymbol may create (assuming MCSymbol doesn't exists) with a private linkage, right ?

I don't have much knowledge about RISC-V, could you provide more context (since I know you are the RISC-V expert 😛 )

@npanchen
Copy link
Contributor

@topperc does MO_ExternalSymbol only refer to MCSymbol with external linkage or to any linkage type ? It seems that even on RISC-V call to GetExternalSymbolSymbol may create (assuming MCSymbol doesn't exists) with a private linkage, right ?

I don't have much knowledge about RISC-V, could you provide more context (since I know you are the RISC-V expert 😛 )

I mainly referred to RISC-V as it was the last target I know that was added, so @topperc implemented that functionality too. As soon as @topperc also maintained x86 for a long period of time, pretty sure he is the right person to answer that question.

Overall, there're several targets that do seem to have copy-pasted same code to handle MO_ExternalSymbol: WAsm, AArch64, RISC-V etc.
It does look like overall expectation is that MO_ExternalSymbol does correspond to external linkage, what does worry me:

  • Naming of the method MachineOperation::isSymbol() const { return OpKind == MO_ExternalSymbol; }. Commit that changed that name just tried to shorten the method name, but the name can imply any MCSymbol.
  • I was unable to find tests for this case for any target in llvm/test or llvm-test-suite.
    Assuming that MO_ExternalSymbol does correspond to external linkage, then the change also is needed for other targets and there should be some verification of this

@tstellar tstellar moved this from Needs Triage to Needs Fix in LLVM Release Status Mar 25, 2025
@topperc
Copy link
Collaborator

topperc commented Mar 26, 2025

CC: @MaskRay @efriedma-quic

@efriedma-quic
Copy link
Collaborator

The assumption, in the backend, is that all the RuntimeLibcalls symbols are, in fact defined in a different module. If you somehow manage to hack things together to make things work anyway, you're going outside the LLVM compilation model; it's not something we intentionally support. There's been a lot of discussion of changing the model in the context of LTO/offloading/etc., but nothing has really happened yet.

That said, #108880 is clearly wrong: the AsmPrinter should not be modifying the visibility of a symbol that way. If some property of a symbol needs to be modified, it should be done using appropriate directives, not by directly messing with MC-internal datastructures.

@efriedma-quic
Copy link
Collaborator

Opened #133291

@weiweichen
Copy link
Contributor

weiweichen commented Mar 27, 2025

The assumption, in the backend, is that all the RuntimeLibcalls symbols are, in fact defined in a different module. If you somehow manage to hack things together to make things work anyway, you're going outside the LLVM compilation model; it's not something we intentionally support. There's been a lot of discussion of changing the model in the context of LTO/offloading/etc., but nothing has really happened yet.

That said, #108880 is clearly wrong: the AsmPrinter should not be modifying the visibility of a symbol that way. If some property of a symbol needs to be modified, it should be done using appropriate directives, not by directly messing with MC-internal datastructures.

Could you please point out what directives to use for "it should be done using appropriate directives"? Why is it wrong to mark 'MO_ExternalSymbol' as extern since the name itself explicitly said External?
Also, could you help point out any guidelines on modifying MC data structures should any change is needed? I can sure add a test for the change, but it feels a bit hard to gauge whether a change is " messing with" the infrastructure that one should avoid doing or it's ok to proceed.

@github-project-automation github-project-automation bot moved this from Needs Fix to Done in LLVM Release Status Mar 28, 2025
swift-ci pushed a commit to swiftlang/llvm-project that referenced this issue Mar 29, 2025
…3291)

Reverts llvm#108880 .

The patch has no regression test, no description of why the fix is
necessary, and the code is modifying MC datastructures in a way that's
forbidden in the AsmPrinter.

Fixes llvm#132055.

(cherry picked from commit cd6e959)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

9 participants