Skip to content

[BOLT] Gadget scanner: reformulate the state for data-flow analysis #131898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 25, 2025

Conversation

atrosinenko
Copy link
Contributor

In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.

Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as ret x1 in
f_nonx30_ret_non_auted test case.

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)

Changes

In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.

Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as ret x1 in
f_nonx30_ret_non_auted test case.


Full diff: https://github.com/llvm/llvm-project/pull/131898.diff

6 Files Affected:

  • (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+10)
  • (modified) bolt/include/bolt/Passes/PAuthGadgetScanner.h (+3-4)
  • (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+78-51)
  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+4)
  • (modified) bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s (+8-11)
  • (modified) bolt/test/binary-analysis/AArch64/gs-pacret-multi-bb.s (+1-2)
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index b285138b77fe7..76ea2489e7038 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -551,6 +551,16 @@ class MCPlusBuilder {
     return Analysis->isReturn(Inst);
   }
 
+  /// Returns the registers that are trusted at function entry.
+  ///
+  /// Each register should be treated as if a successfully authenticated
+  /// pointer was written to it before entering the function (i.e. the
+  /// pointer is safe to jump to as well as to be signed).
+  virtual SmallVector<MCPhysReg> getTrustedLiveInRegs() const {
+    llvm_unreachable("not implemented");
+    return {};
+  }
+
   virtual ErrorOr<MCPhysReg> getAuthenticatedReg(const MCInst &Inst) const {
     llvm_unreachable("not implemented");
     return getNoRegister();
diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index f102f1080e2e8..404dde2901767 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -209,13 +209,12 @@ struct Report {
 
 struct GadgetReport : public Report {
   const GadgetKind &Kind;
-  SmallVector<MCPhysReg> AffectedRegisters;
+  SmallVector<MCPhysReg, 1> AffectedRegisters;
   std::vector<MCInstReference> OverwritingInstrs;
 
   GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-               const BitVector &AffectedRegisters)
-      : Report(Location), Kind(Kind),
-        AffectedRegisters(AffectedRegisters.set_bits()) {}
+               MCPhysReg AffectedRegister)
+      : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
 
   void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 04b0923d34b0c..ebfa606ceb7c3 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -126,18 +126,16 @@ class TrackedRegisters {
 
 // The security property that is checked is:
 // When a register is used as the address to jump to in a return instruction,
-// that register must either:
-// (a) never be changed within this function, i.e. have the same value as when
-//     the function started, or
+// that register must be safe-to-dereference. It must either
+// (a) be safe-to-dereference at function entry and never be changed within this
+//     function, i.e. have the same value as when the function started, or
 // (b) the last write to the register must be by an authentication instruction.
 
 // This property is checked by using dataflow analysis to keep track of which
-// registers have been written (def-ed), since last authenticated. Those are
-// exactly the registers containing values that should not be trusted (as they
-// could have changed since the last time they were authenticated). For pac-ret,
-// any return instruction using such a register is a gadget to be reported. For
-// PAuthABI, probably at least any indirect control flow using such a register
-// should be reported.
+// registers have been written (def-ed), since last authenticated. For pac-ret,
+// any return instruction using a register which is not safe-to-dereference is
+// a gadget to be reported. For PAuthABI, probably at least any indirect control
+// flow using such a register should be reported.
 
 // Furthermore, when producing a diagnostic for a found non-pac-ret protected
 // return, the analysis also lists the last instructions that wrote to the
@@ -156,10 +154,29 @@ class TrackedRegisters {
 //    in the gadgets to be reported. This information is used in the second run
 //    to also track which instructions last wrote to those registers.
 
+/// A state representing which registers are safe to use by an instruction
+/// at a given program point.
+///
+/// To simplify reasoning, let's stick with the following approach:
+/// * when state is updated by the data-flow analysis, the sub-, super- and
+///   overlapping registers are marked as needed
+/// * when the particular instruction is checked if it represents a gadget,
+///   the specific bit of BitVector should be usable to answer this.
+///
+/// For example, on AArch64:
+/// * An AUTIZA X0 instruction marks both X0 and W0 (as well as W0_HI) as
+///   safe-to-dereference. It does not change the state of X0_X1, for example,
+///   as super-registers partially retain their old, unsafe values.
+/// * LDR X1, [X0] marks as unsafe both X1 itself and anything it overlaps
+///   with: W1, W1_HI, X0_X1 and so on.
+/// * RET (which is implicitly RET X30) is a protected return if and only if
+///   X30 is safe-to-dereference - the state computed for sub- and
+///   super-registers is not inspected.
 struct State {
-  /// A BitVector containing the registers that have been clobbered, and
-  /// not authenticated.
-  BitVector NonAutClobRegs;
+  /// A BitVector containing the registers that are either safe at function
+  /// entry and were not clobbered yet, or those not clobbered since being
+  /// authenticated.
+  BitVector SafeToDerefRegs;
   /// A vector of sets, only used in the second data flow run.
   /// Each element in the vector represents one of the registers for which we
   /// track the set of last instructions that wrote to this register. For
@@ -169,16 +186,26 @@ struct State {
   std::vector<SmallPtrSet<const MCInst *, 4>> LastInstWritingReg;
   State() {}
   State(unsigned NumRegs, unsigned NumRegsToTrack)
-      : NonAutClobRegs(NumRegs), LastInstWritingReg(NumRegsToTrack) {}
-  State &operator|=(const State &StateIn) {
-    NonAutClobRegs |= StateIn.NonAutClobRegs;
+      : SafeToDerefRegs(NumRegs), LastInstWritingReg(NumRegsToTrack) {}
+
+  /// Returns S, so that S.merge(S1) == S1.merge(S) == S1.
+  static State getMergeNeutralElement(unsigned NumRegs,
+                                      unsigned NumRegsToTrack) {
+    State S(NumRegs, NumRegsToTrack);
+    S.SafeToDerefRegs.set();
+    return S;
+  }
+
+  State &merge(const State &StateIn) {
+    SafeToDerefRegs &= StateIn.SafeToDerefRegs;
     for (unsigned I = 0; I < LastInstWritingReg.size(); ++I)
       for (const MCInst *J : StateIn.LastInstWritingReg[I])
         LastInstWritingReg[I].insert(J);
     return *this;
   }
+
   bool operator==(const State &RHS) const {
-    return NonAutClobRegs == RHS.NonAutClobRegs &&
+    return SafeToDerefRegs == RHS.SafeToDerefRegs &&
            LastInstWritingReg == RHS.LastInstWritingReg;
   }
   bool operator!=(const State &RHS) const { return !((*this) == RHS); }
@@ -199,7 +226,7 @@ static void printLastInsts(
 
 raw_ostream &operator<<(raw_ostream &OS, const State &S) {
   OS << "pacret-state<";
-  OS << "NonAutClobRegs: " << S.NonAutClobRegs << ", ";
+  OS << "SafeToDerefRegs: " << S.SafeToDerefRegs << ", ";
   printLastInsts(OS, S.LastInstWritingReg);
   OS << ">";
   return OS;
@@ -217,8 +244,8 @@ class PacStatePrinter {
 void PacStatePrinter::print(raw_ostream &OS, const State &S) const {
   RegStatePrinter RegStatePrinter(BC);
   OS << "pacret-state<";
-  OS << "NonAutClobRegs: ";
-  RegStatePrinter.print(OS, S.NonAutClobRegs);
+  OS << "SafeToDerefRegs: ";
+  RegStatePrinter.print(OS, S.SafeToDerefRegs);
   OS << ", ";
   printLastInsts(OS, S.LastInstWritingReg);
   OS << ">";
@@ -257,12 +284,24 @@ class PacRetAnalysis
 
   void preflight() {}
 
+  State createEntryState() {
+    State S(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+    for (MCPhysReg Reg : BC.MIB->getTrustedLiveInRegs())
+      S.SafeToDerefRegs |= BC.MIB->getAliases(Reg, /*OnlySmaller=*/true);
+    return S;
+  }
+
   State getStartingStateAtBB(const BinaryBasicBlock &BB) {
-    return State(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+    if (BB.isEntryPoint())
+      return createEntryState();
+
+    return State::getMergeNeutralElement(
+        NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   }
 
   State getStartingStateAtPoint(const MCInst &Point) {
-    return State(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+    return State::getMergeNeutralElement(
+        NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   }
 
   void doConfluence(State &StateOut, const State &StateIn) {
@@ -277,7 +316,7 @@ class PacRetAnalysis
       dbgs() << ")\n";
     });
 
-    StateOut |= StateIn;
+    StateOut.merge(StateIn);
 
     LLVM_DEBUG({
       dbgs() << "   merged state: ";
@@ -298,7 +337,7 @@ class PacRetAnalysis
     });
 
     State Next = Cur;
-    BitVector Written = BitVector(NumRegs, false);
+    BitVector Clobbered(NumRegs, false);
     // Assume a call can clobber all registers, including callee-saved
     // registers. There's a good chance that callee-saved registers will be
     // saved on the stack at some point during execution of the callee.
@@ -307,36 +346,27 @@ class PacRetAnalysis
     // Also, not all functions may respect the AAPCS ABI rules about
     // caller/callee-saved registers.
     if (BC.MIB->isCall(Point))
-      Written.set();
+      Clobbered.set();
     else
-      // FIXME: `getWrittenRegs` only sets the register directly written in the
-      // instruction, and the smaller aliasing registers. It does not set the
-      // larger aliasing registers. To also set the larger aliasing registers,
-      // we'd have to call `getClobberedRegs`.
-      // It is unclear if there is any test case which shows a different
-      // behaviour between using `getWrittenRegs` vs `getClobberedRegs`. We'd
-      // first would like to see such a test case before making a decision
-      // on whether using `getClobberedRegs` below would be better.
-      // Also see the discussion on this at
-      // https://github.com/llvm/llvm-project/pull/122304#discussion_r1939511909
-      BC.MIB->getWrittenRegs(Point, Written);
-    Next.NonAutClobRegs |= Written;
+      BC.MIB->getClobberedRegs(Point, Clobbered);
+    Next.SafeToDerefRegs.reset(Clobbered);
     // Keep track of this instruction if it writes to any of the registers we
     // need to track that for:
     for (MCPhysReg Reg : RegsToTrackInstsFor.getRegisters())
-      if (Written[Reg])
+      if (Clobbered[Reg])
         lastWritingInsts(Next, Reg) = {&Point};
 
     ErrorOr<MCPhysReg> AutReg = BC.MIB->getAuthenticatedReg(Point);
     if (AutReg && *AutReg != BC.MIB->getNoRegister()) {
-      // FIXME: should we use `OnlySmaller=false` below? See similar
-      // FIXME about `getWrittenRegs` above and further discussion about this
-      // at
-      // https://github.com/llvm/llvm-project/pull/122304#discussion_r1939515516
-      Next.NonAutClobRegs.reset(
-          BC.MIB->getAliases(*AutReg, /*OnlySmaller=*/true));
-      if (RegsToTrackInstsFor.isTracked(*AutReg))
-        lastWritingInsts(Next, *AutReg).clear();
+      // The sub-registers of *AutReg are also trusted now, but not its
+      // super-registers (as they retain untrusted register units).
+      BitVector AuthenticatedSubregs =
+          BC.MIB->getAliases(*AutReg, /*OnlySmaller=*/true);
+      for (MCPhysReg Reg : AuthenticatedSubregs.set_bits()) {
+        Next.SafeToDerefRegs.set(Reg);
+        if (RegsToTrackInstsFor.isTracked(Reg))
+          lastWritingInsts(Next, Reg).clear();
+      }
     }
 
     LLVM_DEBUG({
@@ -397,14 +427,11 @@ static std::shared_ptr<Report> tryCheckReturn(const BinaryContext &BC,
   });
   if (BC.MIB->isAuthenticationOfReg(Inst, RetReg))
     return nullptr;
-  BitVector UsedDirtyRegs = S.NonAutClobRegs;
-  LLVM_DEBUG({ traceRegMask(BC, "NonAutClobRegs at Ret", UsedDirtyRegs); });
-  UsedDirtyRegs &= BC.MIB->getAliases(RetReg, /*OnlySmaller=*/true);
-  LLVM_DEBUG({ traceRegMask(BC, "Intersection with RetReg", UsedDirtyRegs); });
-  if (!UsedDirtyRegs.any())
+  LLVM_DEBUG({ traceRegMask(BC, "SafeToDerefRegs", S.SafeToDerefRegs); });
+  if (S.SafeToDerefRegs[RetReg])
     return nullptr;
 
-  return std::make_shared<GadgetReport>(RetKind, Inst, UsedDirtyRegs);
+  return std::make_shared<GadgetReport>(RetKind, Inst, RetReg);
 }
 
 FunctionAnalysisResult
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 613b24c4553e2..d238a1df5c7d7 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -191,6 +191,10 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
     return false;
   }
 
+  SmallVector<MCPhysReg> getTrustedLiveInRegs() const override {
+    return {AArch64::LR};
+  }
+
   ErrorOr<MCPhysReg> getAuthenticatedReg(const MCInst &Inst) const override {
     switch (Inst.getOpcode()) {
     case AArch64::AUTIAZ:
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 0d263199b376f..586da6d2a92e4 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -183,17 +183,14 @@ f_tail_called:
         .globl  f_nonx30_ret_non_auted
         .type   f_nonx30_ret_non_auted,@function
 f_nonx30_ret_non_auted:
-// FIXME: x1 is not authenticated, so should this be reported?
-//        Note that we assume it's fine for x30 to not be authenticated before
-//        returning to, as assuming that x30 is not attacker controlled at function
-//        entry is part (implicitly) of the pac-ret hardening scheme.
-//        It's probably an open question whether for other hardening schemes, such as
-//        PAuthABI, which registers should be considered "clean" or not at function entry.
-//        In other words, which registers have to be authenticated before being used as
-//        a pointer and which ones not?
-//        For a more detailed discussion, see
-//        https://github.com/llvm/llvm-project/pull/122304#discussion_r1923662744
-// CHECK-NOT: f_nonx30_ret_non_auted
+// x1 is neither authenticated nor implicitly considered safe at function entry.
+// Note that we assume it's fine for x30 to not be authenticated before
+// returning to, as assuming that x30 is not attacker controlled at function
+// entry is part (implicitly) of the pac-ret hardening scheme.
+//
+// CHECK-LABEL: GS-PAUTH: non-protected ret found in function f_nonx30_ret_non_auted, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-NEXT:    The instruction is     {{[0-9a-f]+}}:       ret
+// CHECK-NEXT:    The 0 instructions that write to the affected registers after any authentication are:
         ret     x1
         .size f_nonx30_ret_non_auted, .-f_nonx30_ret_non_auted
 
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-multi-bb.s b/bolt/test/binary-analysis/AArch64/gs-pacret-multi-bb.s
index f74e825ed8fc1..bd8edbc676c34 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-multi-bb.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-multi-bb.s
@@ -17,9 +17,8 @@ f_crossbb1:
         .size f_crossbb1, .-f_crossbb1
 // CHECK-LABEL: GS-PAUTH: non-protected ret found in function f_crossbb1, basic block {{[^,]+}}, at address
 // CHECK-NEXT:  The instruction is     {{[0-9a-f]+}}:       ret
-// CHECK-NEXT:  The 2 instructions that write to the affected registers after any authentication are:
+// CHECK-NEXT:  The 1 instructions that write to the affected registers after any authentication are:
 // CHECK-NEXT:  1.     {{[0-9a-f]+}}:      ldp     x29, x30, [sp], #0x10
-// CHECK-NEXT:  2.     {{[0-9a-f]+}}:      autiasp
 
 // A test that checks that the dataflow state tracking across when merging BBs
 // seems to work:

@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-reformulate-dfa-state branch from aa9215f to 7fecfc4 Compare March 20, 2025 13:34
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-refactor-pacret-analysis branch 2 times, most recently from 023876d to a5d954c Compare March 20, 2025 16:39
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-reformulate-dfa-state branch from 7fecfc4 to 9de64af Compare March 20, 2025 16:39
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-refactor-pacret-analysis branch from a5d954c to 856d59c Compare March 20, 2025 18:15
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-reformulate-dfa-state branch from 9de64af to 3f11341 Compare March 20, 2025 18:15
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-refactor-pacret-analysis branch from 856d59c to d9dbf0a Compare March 21, 2025 08:21
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-reformulate-dfa-state branch 2 times, most recently from 709008c to 479bf68 Compare March 21, 2025 15:52
Base automatically changed from users/atrosinenko/bolt-gs-refactor-pacret-analysis to main March 21, 2025 16:54
In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.

Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as `ret x1` in
`f_nonx30_ret_non_auted` test case.
@atrosinenko atrosinenko force-pushed the users/atrosinenko/bolt-gs-reformulate-dfa-state branch from 479bf68 to 7e77ccb Compare March 21, 2025 16:57
Copy link
Collaborator

@kbeyls kbeyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, I just have a few minor comments.

Comment on lines +554 to +562
/// Returns the registers that are trusted at function entry.
///
/// Each register should be treated as if a successfully authenticated
/// pointer was written to it before entering the function (i.e. the
/// pointer is safe to jump to as well as to be signed).
virtual SmallVector<MCPhysReg> getTrustedLiveInRegs() const {
llvm_unreachable("not implemented");
return {};
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if MCPlusBuilder is the right place for this to live...
The reason why I'm not sure is that to me it seems that MCPlusBuilder is mostly about querying the property of instructions, maybe at most relative to an assumed ABI.
It seems to me that getTrustedLiveInRegs might be encoding an assumed, implicit, threat model too?

Apologies for not explaining this very well. I'm just trying to make sure this function goes into the most appropriate place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the set of registers returned by getTrustedLiveInRegs on AArch64 can be derived from the fact that LR is set by branch-with-link instructions. To some extent, this does look more like a property of the ABI, but as far as I can see target-specific hooks are placed either to lib/Target/XYZ/XYZMCPlusBuilder.cpp or to lib/Target/XYZ/XYZMCSymbolizer.(h|cpp), so there doesn't seem to be many places where such target-specific hook can be defined.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you're right, there aren't many places in Bolt that look target-specific... OK, let's just keep it in MCPlusBuilder.

: SafeToDerefRegs(NumRegs), LastInstWritingReg(NumRegsToTrack) {}

/// Returns S, so that S.merge(S1) == S1.merge(S) == S1.
static State getMergeNeutralElement(unsigned NumRegs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't fully sure what getMergeNeutralElement meant when I first read this code linearly.
Reading further, I see this method is used to initialize the state object in the dataflow analysis.
I'm wondering if it would be easier to understand for most readers of this code if the name instead were something like createInitialState or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that this static function returns a neutral element w.r.t. merge operation (in an algebraic sense), this is the reason for keeping it right in the State class, as opposed to getStartingStateAt* methods of PacRetAnalysis, as this function should be aligned with merge. Maybe this is a bit "overengineered" name for a function returning an initial state, but after all, "starting state" returned by PacRetAnalysis::getStartingStateAt* is "initial" as well, but for a different purpose.

On the other hand, this function definitely deserves an explanation that this "initial" state is initial in the same sense as 0 being an initial value for accumulator when computing a sum of a set of integers. I will add a comment explaining what is the expected usage of this function, thank you for pointing this out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit more, finally dropped this function and switched to handling "empty" state explicitly. It is indeed rather hard to reason about when/whether the state of entry block (which is the only state known at the beginning on the analysis) is finally propagated into every other program point's state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, so there now is an additional "empty" or "uninitialized" state. Or sometimes, in dataflow "lattice" theory terms: the bottom (⟂) state. I'm not sure if everywhere in the world, lattice theory is taught using the same symbols and terms. Apologies if the terms I use here don't make sense.

FWIW, in my prototype scanner for stack clash, I did end up writing some supporting classes to make handling "uninitialized" state more generic. See (commits such as 95119f4, 5c7dca7, and 728abe1).

That being said, I don't think it makes sense to use that here, it's simpler the way you've written it here.
If we'd end up adding lots of data flow analyses (maybe also for other binary analyses), we could introduce that generic LatticeT class I'm pointing to above.

Comment on lines +554 to +562
/// Returns the registers that are trusted at function entry.
///
/// Each register should be treated as if a successfully authenticated
/// pointer was written to it before entering the function (i.e. the
/// pointer is safe to jump to as well as to be signed).
virtual SmallVector<MCPhysReg> getTrustedLiveInRegs() const {
llvm_unreachable("not implemented");
return {};
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you're right, there aren't many places in Bolt that look target-specific... OK, let's just keep it in MCPlusBuilder.

: SafeToDerefRegs(NumRegs), LastInstWritingReg(NumRegsToTrack) {}

/// Returns S, so that S.merge(S1) == S1.merge(S) == S1.
static State getMergeNeutralElement(unsigned NumRegs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, so there now is an additional "empty" or "uninitialized" state. Or sometimes, in dataflow "lattice" theory terms: the bottom (⟂) state. I'm not sure if everywhere in the world, lattice theory is taught using the same symbols and terms. Apologies if the terms I use here don't make sense.

FWIW, in my prototype scanner for stack clash, I did end up writing some supporting classes to make handling "uninitialized" state more generic. See (commits such as 95119f4, 5c7dca7, and 728abe1).

That being said, I don't think it makes sense to use that here, it's simpler the way you've written it here.
If we'd end up adding lots of data flow analyses (maybe also for other binary analyses), we could introduce that generic LatticeT class I'm pointing to above.

// to Inst, assume it to be unreachable and report a warning.
if (S.empty()) {
Result.Diagnostics.push_back(std::make_shared<GenericReport>(
Inst, "Warning: unreachable instruction found"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to remember that if you do this analysis on a large amount of code (for example all libraries in a linux distro), it will find unreachable code.
I'm happy to keep this warning, but it might be that this diagnostic turns out to be too noisy when scanning large amounts of code....
We can fix/tune that in another PR though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial intention was not to skip instructions silently (in case it is skipped due to some error in PacRetAnalysis). On one hand, this should be caught by the tests, on the other hand, this warning helps implementing the tests. Anyway, this should be trivial to remove this warning later.

@atrosinenko atrosinenko merged commit b6b40e9 into main Mar 25, 2025
10 checks passed
@atrosinenko atrosinenko deleted the users/atrosinenko/bolt-gs-reformulate-dfa-state branch March 25, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants