-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86Backend][M68KBackend] Make Ctx in X86MCInstLower (M68KInstLower) the same as AsmPrinter.OutContext #133352
[X86Backend][M68KBackend] Make Ctx in X86MCInstLower (M68KInstLower) the same as AsmPrinter.OutContext #133352
Conversation
…eic/x86-getexternsymbolsymbol
✅ With the latest revision this PR passed the C/C++ code formatter. |
…eic/x86-getexternsymbolsymbol
Verified that this change won't negatively impact this issue #132055 (comment)
|
This solution appears to potentially mask an underlying issue. Can you state the motivation behind the change? The first few paragraphs only describe "what" the scenario is, but doesn't really state your end goal, and how this helps your MC Linker effort.
It would be beneficial to broaden the review process, especially when the reviewer selected is closely associated and neither of you has prior experience with this specific code area. (#108880 landed quickly within four hours. Considering the reviewer's close association, a wider review might offer additional perspectives.) |
So for MC Linker, the main difference from normal codegen flow is that we are trying to AsmPrint functions that are initially in one llvm module, then being split into submodules for parallel codegen, and now use one AsmPrinter to generate the output. The parallel codegen pipeline generates the codegen data structures in their own I think since normal use case (probably anything other than our un-conventional case) only handles one llvm module all the way through in the whole codegen pipeline, there is no need to deduplicate, and this change is probably just an NFC. Let me add this to the PR description as well!
Totally agree. The PR is "draft" now so that I can put out the code to discuss with @npanchen about the solution, but not incur too much noises to others. Once it is "Ready for review", I'll add folks who have given us feedbacks on the previous PR+issue as reviewers. (sorry for being pedantic here) But to my defense, #108880 took 4 days to land instead of 4 hours (it was 4 hours after approval). The PR notified core maintainers like |
@llvm/pr-subscribers-backend-m68k @llvm/pr-subscribers-backend-x86 Author: weiwei chen (weiweichen) ChangesIn
at
However, this newly created symbol will not be marked properly with its Use llvm-project/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp Lines 366 to 367 in 14c36db
llvm-project/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp Lines 145 to 148 in 14c36db
Note that the critical point here is that This fixes an error while running the generated code in ORC JIT for our use case with MCLinker (see more details below): We (Mojo) are trying to do a MC level linking so that we break llvm module into multiple submodules to compile and codegen in parallel (technically into *.o files with symbol linkage type change), but instead of archive all of them into one I think since normal use case (probably anything other than our un-conventional case) only handles one llvm module all the way through in the whole codegen pipeline, there is no need to deduplicate, and this change is probably just an NFC. Full diff: https://github.com/llvm/llvm-project/pull/133352.diff 1 Files Affected:
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index 3f6cd55618666..c93a2879a057f 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -192,8 +192,15 @@ MCSymbol *X86MCInstLower::GetSymbolFromOperand(const MachineOperand &MO) const {
}
Name += Suffix;
- if (!Sym)
- Sym = Ctx.getOrCreateSymbol(Name);
+ if (!Sym) {
+ // If new MCSymbol needs to be created for
+ // MachineOperand::MO_ExternalSymbol, create it as a symbol
+ // in AsmPrinter's OutContext.
+ if (MO.isSymbol())
+ Sym = AsmPrinter.OutContext.getOrCreateSymbol(Name);
+ else
+ Sym = Ctx.getOrCreateSymbol(Name);
+ }
// If the target flags on the operand changes the name of the symbol, do that
// before we return the symbol.
|
This is another attempt to address the issue what was poorly addressed in #108880 🤦♀️ . Adding more explanation on the motivation/problem to address and the rationale on the change. Appreciate suggestions on how to add a test in addition to existing tests not being broken this change 🙏 (update) working on adding a c++ unitttest. |
I don't want to blame anyone here. The review process for patches to LLVM relies, to a large extent, on code authors exercising good judgement, and other people catching mistakes. And new contributors often get tripped up here. So, the patch probably shouldn't have been merged, but it's not a big deal. Reverts are easy enough. And we'll make sure everyone is on the same page going forward. If we are going to make changes here, I'd like to try to ensure consistency across targets. And that whatever API usage rule you want to impose for the sake of your out-of-tree code actually makes sense; if there's a general rule we should follow for creating MCSymbols from the AsmPrinter, we should document it and make sure we consistently follow it. |
…eic/x86-getexternsymbolsymbol
This is very good point! I did some "grep" and found out that actually most backends are already using llvm-project/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp Lines 366 to 367 in 3e742b5
llvm-project/llvm/lib/Target/ARC/ARCMCInstLower.cpp Lines 47 to 48 in 3e742b5
llvm-project/llvm/lib/Target/BPF/BPFMCInstLower.cpp Lines 71 to 72 in 3e742b5
llvm-project/llvm/lib/Target/Lanai/LanaiMCInstLower.cpp Lines 118 to 119 in 3e742b5
llvm-project/llvm/lib/Target/Mips/MipsMCInstLower.cpp Lines 146 to 147 in 3e742b5
llvm-project/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp Lines 49 to 50 in 3e742b5
llvm-project/llvm/lib/Target/XCore/XCoreMCInstLower.cpp Lines 49 to 50 in 3e742b5
llvm-project/llvm/lib/Target/Xtensa/XtensaAsmPrinter.cpp Lines 244 to 245 in 3e742b5
With only one except (in addition to X86) for M68k backend. Adding the change for M68k backend as well. Is there another place to document this other than comments in the code? |
Added a C++ unit test as well. |
Ready for review for real now 🙏 ! |
if (MO.isSymbol()) | ||
Sym = AsmPrinter.OutContext.getOrCreateSymbol(Name); | ||
else | ||
Sym = Ctx.getOrCreateSymbol(Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. This is starting to make more sense, now. Normally, there's only one MCContext, but because you're doing this "parallel codegen" thing, you're passing a different MCContext to construct the MachineFunction... and the coupling between a symbol and its context is loose enough that you can pass a symbol from the wrong MCContext to the AsmPrinter, and sort of get away with it.
If preventing that is the goal, why not change X86MCInstLower::X86MCInstLower instead? Just change Ctx(mf.getContext())
to Ctx(asmprinter.OutContext)
.
That said, more generally, if you want everything to work reliably, you'll probably need to completely eliminate the MCContext reference from MachineFunction. Otherwise, you'll continue to run into issues where the MCSymbol is from the wrong context. Maybe doable, but involves a significant amount of refactoring to avoid constructing MCSymbols early... and I'm not sure if there are any other weird interactions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If preventing that is the goal, why not change X86MCInstLower::X86MCInstLower instead? Just change
Ctx(mf.getContext())
toCtx(asmprinter.OutContext)
.
Hmmm, this probably won't work (I mean it will work for MO_ExternalSymbol to get a unique MCSymbol, but) because we do still need mf.getContext()
to do most of the codegen here for this function pass to run on the corresponding MachineFunction
. Ctx
has to be mf.getContext()
as most of the codegen for this MachineFunction is in this MCContext. AsmPrinter's OutContext is just for the output here.
Yeah, for most cases, AsmPrinter.OutContext
is the same as mf.getContext
because there is just one MCContext
in the whole pipeline. I'm not sure what is the motivation for other backends to also use AsmPrinter.OutContext
for the same case here, but the change does make them look more consistent with each other 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
said, more generally, if you want everything to work reliably, you'll probably need to completely eliminate the MCContext reference from MachineFunction. Otherwise, you'll continue to run into issues where the MCSymbol is from the wrong context. Maybe doable, but involves a significant amount of refactoring to avoid constructing MCSymbols early... and I'm not sure if there are any other weird interactions here.
Interesting idea, but I do need to think more about what does this entails, and as you said, it will probably be a significant amount of refactoring, so maybe as follow-ups with more concrete considerations on a larger scale refactoring?
(Also very selfishly speaking, this PR will help significantly with our MCLinker to function correctly, otherwise, half of tests are failing now 😢, so would be great to get something in first so that we can keep upgrading weekly 🙏 . )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ctx has to be mf.getContext() as most of the codegen for this MachineFunction is in this MCContext.
I'm not really understanding what this means... I guess you've sort of informally partitioned things so that "local" symbols are created in the function's MCContext, and "global" symbols are created in the global MCContext? I don't think that's really sustainable... if we're going to consistently partition symbols, the two kinds of symbols can't both just be MCSymbol*
, or you'll inevitably trip over issues in more obscure cases where we use the wrong MCContext.
I don't think the patch in its current form will cause any immediate issues, but I don't want to commit to a design that hasn't really been reviewed by LLVM community, and probably won't be approved in its current form because it's very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ctx has to be mf.getContext() as most of the codegen for this MachineFunction is in this MCContext.
I'm not really understanding what this means... I guess you've sort of informally partitioned things so that "local" symbols are created in the function's MCContext, and "global" symbols are created in the global MCContext? I don't think that's really sustainable... if we're going to consistently partition symbols, the two kinds of symbols can't both just be
MCSymbol*
, or you'll inevitably trip over issues in more obscure cases where we use the wrong MCContext.
This is valid concern, but we are willing to take the risk for potentially running into issues (since we don't have substantial example right now to show this will be a big problem and worrying about "what-if-sm" is hard 😞) in the future at this point to unblock all of our tests. I'm definitely open to suggestion on how to make the MCLinker more solid (maybe after/during my talk next month?).
I don't think the patch in its current form will cause any immediate issues, but I don't want to commit to a design that hasn't really been reviewed by LLVM community, and probably won't be approved in its current form because it's very confusing.
I understand your reservation and being careful, thank you for being thorough here! Though I want to point out that this change has already been applied to most of other backends (and I imagine those changes were being reviewed and approved by the community?). So I'd push back a bit on the assumption that this is "a design that hasn't really been reviewed by LLVM community"?
and probably won't be approved in its current form because it's very confusing.
Do you have any suggestion on which part is confusing and what can be added to make it less so? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's really sustainable... if we're going to consistently partition symbols, the two kinds of symbols can't both just be MCSymbol*, or you'll inevitably trip over issues in more obscure cases where we use the wrong MCContext.
Why can't the two kinds be both "MCSymbo*" just because their scopes are different (local vs global)? What's wrong with `MCSymbo' wrt scoping? Also, I don't quite get "inevitably trip over issues in more obscure cases" part, I'm afraid it's a bit vague statement. Could you be more specific on what cases that may be? And since most of other backends are already doing this, do you think they are also doing something unsustainable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efriedma-quic the absence of a fix blocks our pulldown as 50% of tests are failing due to reverted #133291 (comment) change.
Even though I agree that there might be a better "bulletproof" solution, but is it ok to move forward with Weiwei's fix first and then consider much better solution ? My rationale here is that Weiwei's proposed fix is aligned with what most of the targets are doing now and also there was an interest in community to open source parallel MCLinker. So when it comes to open sourcing there will be more use-cases and we will come back to that topic again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change has already been applied to most of other backends
The other backends work in your model by accident. They're not intentionally using one context or the other, they're just not using MachineFunction::getContext() at all in the AsmPrinter.
Could you be more specific on what cases that may be?
Do you have any suggestion on which part is confusing and what can be added to make it less so? 🙏
Basically, if MF.getContext().getOrCreateSymbol() means something different from AsmPrinter.OutContext.getOrCreateSymbol(), it's a lot harder to understand what's going on. You have two APIs that look the same on the surface, and return a value of the same type, but actually mean something subtly different. Anything that creates an MCSymbol would need to be aware that both APIs exist, and pick the correct one. Some existing code probably isn't consistent with your model, and anyone writing new code gets zero guidance from the API names or type system to help pick the correct one.
And the "local" one is never really right: MCSymbols passed to an MCStreamer are supposed to be part of the same MCContext as the MCStreamer.
To make things consistent, there needs to be one correct way to construct an MCSymbol. Either we need a "MachineFunctionSymbol" type to represent symbols that haven't been emitted yet, or you need to stop sharing MCStreamers between different compilation units. This patch isn't making any progress towards either of those models, or anything similar.
I don't think this patch will cause any immediate issues, because it's basically a no-op if there's only one MCContext, but this isn't the right long-term solution. And I don't want to commit to merging an unbounded number of temporary hacks while you work out how to implement the right solution.
@efriedma-quic, would you be open to a short zoom/google meeting to talk about this? (maybe easier if we can talk than typing here?) |
…eic/x86-getexternsymbolsymbol
@@ -0,0 +1,180 @@ | |||
//===- llvm/unittest/CodeGen/AArch64SelectionDAGTest.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File name is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, oops, fixed!
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include "../lib/Target/X86/X86ISelLowering.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the header used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, removed
#include "llvm/AsmParser/Parser.h" | ||
#include "llvm/CodeGen/AsmPrinter.h" | ||
#include "llvm/CodeGen/MachineModuleInfo.h" | ||
#include "llvm/CodeGen/SelectionDAG.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this test uses anything from SelectionDAG.h
#include "llvm/IR/Module.h" | ||
#include "llvm/MC/MCStreamer.h" | ||
#include "llvm/MC/TargetRegistry.h" | ||
#include "llvm/Support/KnownBits.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this file uses KnownBits.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, removed
Unless I'm misreadying, |
class X86MCInstLowerTest : public testing::Test { | ||
protected: | ||
static void SetUpTestCase() { | ||
LLVMInitializeX86TargetInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the X86 target isn't built, can you still call these funtions or do they not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like compilation will fail if I'm not building X86 backend. Changing this to
InitializeAllTargetMCs();
InitializeAllTargetInfos();
InitializeAllTargets();
InitializeAllAsmPrinters();
Tested with only building AArch64 backend, this test gets skipped.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from X86MCInstLowerTest
[ RUN ] X86MCInstLowerTest.moExternalSymbol_MCSYMBOL
/Users/weiwei.chen/research/modularml/modular/third-party/llvm-project/llvm/unittests/CodeGen/X86MCInstLowerTest.cpp:107: Skipped
[ SKIPPED ] X86MCInstLowerTest.moExternalSymbol_MCSYMBOL (1 ms)
[----------] 1 test from X86MCInstLowerTest (1 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2 ms total)
[ PASSED ] 0 tests.
[ SKIPPED ] 1 test, listed below:
[ SKIPPED ] X86MCInstLowerTest.moExternalSymbol_MCSYMBOL
Oh, good catch!! Looks like all the other backends are doing the same thing as AARch64 here that
@topperc, do you by any chance know if this exception is a must to have or X86 can also switch to use OutContext for
It looks like it's probably fine to switch here to be consistent with all the other backends unless there is something critical I'm missing here? I pushed a new commit to reflect that change, buildkite/github-pull-requests/linux-linux-x64 is passed. I also tested on our side that this fixes our issue as well. I'm wondering if this change is more acceptable now? |
…eic/x86-getexternsymbolsymbol
…eic/x86-getexternsymbolsymbol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This avoids creating a distinction between the 2 different sources of MCContexts in LLVM. As long as this happens to work for your use case that seems fine, but it seems fragile and could be broken in some other way in the future.
Thank you! Will definitely keep an eye on this part and follow up on more solid design if this turns out to be an issue down the road 🙏. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/9441 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/63/builds/4983 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/7040 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/7340 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/94/builds/5918 Here is the relevant piece of the build log for the reference
|
Sending this PR #134481 which should probably help to fix all the sanitizer failures here for |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2646 Here is the relevant piece of the build log for the reference
|
Reordering `OS` and `PassMgrF` should fix the asan failure that's caused by OS being destroyed before `PassMgrF` deletes the AsmPrinter. As shown in[ this asan run ](https://lab.llvm.org/buildbot/#/builders/52/builds/7340/steps/12/logs/stdio) ``` This frame has 15 object(s): [32, 48) 'PassMgrF' (line 154) [64, 1112) 'Buf' (line 155) [1248, 1304) 'OS' (line 156) <== Memory access at offset 1280 is inside this variable ``` which indicates an ordering problem. This should help to fix all the sanitizer failures caused by the test `X86MCInstLowerTest.cpp` that's introduced by [this PR](#133352 (comment)).
Reordering `OS` and `PassMgrF` should fix the asan failure that's caused by OS being destroyed before `PassMgrF` deletes the AsmPrinter. As shown in[ this asan run ](https://lab.llvm.org/buildbot/#/builders/52/builds/7340/steps/12/logs/stdio) ``` This frame has 15 object(s): [32, 48) 'PassMgrF' (line 154) [64, 1112) 'Buf' (line 155) [1248, 1304) 'OS' (line 156) <== Memory access at offset 1280 is inside this variable ``` which indicates an ordering problem. This should help to fix all the sanitizer failures caused by the test `X86MCInstLowerTest.cpp` that's introduced by [this PR](llvm/llvm-project#133352 (comment)).
In
X86MCInstLower::LowerMachineOperand
, a newMCSymbol
can be created inGetSymbolFromOperand(MO)
whereMO.getType()
isMachineOperand::MO_ExternalSymbol
at
llvm-project/llvm/lib/Target/X86/X86MCInstLower.cpp
Line 196 in 725a7b6
However, this newly created symbol will not be marked properly with its
IsExternal
field sinceCtx.getOrCreateSymbol(Name)
doesn't know if the newly createdMCSymbol
is forMachineOperand::MO_ExternalSymbol
.Looking at other backends, for example
Arch64MCInstLower
is doing for handlingMC_ExternalSymbol
llvm-project/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
Lines 366 to 367 in 14c36db
llvm-project/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
Lines 145 to 148 in 14c36db
It creates/gets the MCSymbol from
AsmPrinter.OutContext
instead of fromCtx
. Moreover,Ctx
forAArch64MCLower
is the same asAsmPrinter.OutContext
.llvm-project/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
Line 100 in 8e7d6ba
This patch makes
X86MCInstLower
andM68KInstLower
to have theirCtx
fromAsmPrinter.OutContext
instead of getting it fromMF.getContext()
to be consistent with all the other backends.I think since normal use case (probably anything other than our un-conventional case) only handles one llvm module all the way through in the codegen pipeline till the end of code emission (AsmPrint),
AsmPrinter.OutContext
is the same as MachineFunction's MCContext, so this change is an NFC.This fixes an error while running the generated code in ORC JIT for our use case with MCLinker (see more details below):
#133291 (comment)
We (Mojo) are trying to do a MC level linking so that we break llvm module into multiple submodules to compile and codegen in parallel (technically into *.o files with symbol linkage type change), but instead of archive all of them into one
.a
file, we want to fix the symbol linkage type and still produce one *.o file. The parallel codegen pipeline generates the codegen data structures in their ownMCContext
(which isCtx
here). So if functionf
andg
got split into different submodules, they will have differentCtx
. And when we try to create an external symbol with the same name for each of them withCtx.getOrCreate(SymName)
, we will get two differentMCSymbol*
becausef
andg
'sMCContext
are different and they can't see each other. This is unfortunately not what we want for external symbols. UsingAsmPrinter.OutContext
helps, since it is shared, if we try to get or create theMCSymbol
there, we'll be able to deduplicate.