Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT][AArch64] Support for pointer authentication (v2) #120064

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

bgergely0
Copy link

This PR contains a new approach to replace #117578.

Problem

Currently BOLT does not support Aarch64 binaries with pointer authentication, because it cannot correctly handle .cfi_negate_ra_state.

This has been raised in several issues:

Where are OpNegateRAState CFIs needed?

These CFI instructions need to be placed in locations, where one instruction has a different return address state (signed or not signed) than the previous one.

As instructions are moving around during optimization, we cannot know where these should be placed, until all optimizations are done.

For this reason, my implementation does not store these CFIs on the function during binary reading, instead it stores information about them on instructions as MCAnnotation, to be able to work out the states of instructions later.

MarkRAStates Pass

  • this pass processes the extra MCAnnotations to work out the state of each instruction before the optimizations happen
  • because the OpNegateRAState only flips a bit, we don't know the RA state before and after it, we only know that it changes
  • RememberState and RestoreState also modify the RA state: at restore we have to know the remembered state, so these are also saved as MCAnnotations, and used during the MarkRAStates Pass.

InsertNegateRAStatePass

  • this pass is similar to what I had in the previus version: it looks for changes in the RA state, and adds the CFI to these locations
  • the pass is added after other optimizations are done
  • the pass has to work out the state of instructions added during optimizations: these instructions inherit the state of the last instruction with known state

Difference from #117578

WIP

Not all pointer signing and authenticating instructions are covered yet, only paciasp and autiasp. This is enough to test on binaries built with -mbranch-protection=pac-ret.

Testing

The change has been tested on unittests binaries from the Chromium project, namely zlib_unittests and base_unittests.
It only makes sense to tests this on binaries which rely on the unwinder (e.g. maintaining stack traces internally), otherwise the incorrect placement of .cfi_negate_ra_state-s in the output binary will not be discovered.
The change has been tested with optimizations enabled, function reordering, function splitting and basicblock reordering are all covered by the implementation.

- save OpNegateRAState, RememberState and RestoreState locations
  when parsing input
- determine the RA states from these before other optimizations
  in MarkRAStates Pass
- after optimizations, we can insert OpNegateRAState at state
  boundaries and other needed locations (e.g. split functions)
  in InsertNegateRAStatePass
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the BOLT label Dec 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2024

@llvm/pr-subscribers-bolt

Author: Gergely Bálint (bgergely0)

Changes

This PR contains a new approach to replace #117578.

Problem

Currently BOLT does not support Aarch64 binaries with pointer authentication, because it cannot correctly handle .cfi_negate_ra_state.

This has been raised in several issues:

Where are OpNegateRAState CFIs needed?

These CFI instructions need to be placed in locations, where one instruction has a different return address state (signed or not signed) than the previous one.

As instructions are moving around during optimization, we cannot know where these should be placed, until all optimizations are done.

For this reason, my implementation does not store these CFIs on the function during binary reading, instead it stores information about them on instructions as MCAnnotation, to be able to work out the states of instructions later.

MarkRAStates Pass

  • this pass processes the extra MCAnnotations to work out the state of each instruction before the optimizations happen
  • because the OpNegateRAState only flips a bit, we don't know the RA state before and after it, we only know that it changes
  • RememberState and RestoreState also modify the RA state: at restore we have to know the remembered state, so these are also saved as MCAnnotations, and used during the MarkRAStates Pass.

InsertNegateRAStatePass

  • this pass is similar to what I had in the previus version: it looks for changes in the RA state, and adds the CFI to these locations
  • the pass is added after other optimizations are done
  • the pass has to work out the state of instructions added during optimizations: these instructions inherit the state of the last instruction with known state

Difference from #117578

WIP

Not all pointer signing and authenticating instructions are covered yet, only paciasp and autiasp. This is enough to test on binaries built with -mbranch-protection=pac-ret.

Testing

The change has been tested on unittests binaries from the Chromium project, namely zlib_unittests and base_unittests.
It only makes sense to tests this on binaries which rely on the unwinder (e.g. maintaining stack traces internally), otherwise the incorrect placement of .cfi_negate_ra_state-s in the output binary will not be discovered.
The change has been tested with optimizations enabled, function reordering, function splitting and basicblock reordering are all covered by the implementation.


Patch is 26.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120064.diff

14 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryFunction.h (+46)
  • (modified) bolt/include/bolt/Core/MCPlus.h (+9-1)
  • (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+66)
  • (added) bolt/include/bolt/Passes/InsertNegateRAStatePass.h (+25)
  • (added) bolt/include/bolt/Passes/MarkRAStates.h (+22)
  • (modified) bolt/lib/Core/BinaryBasicBlock.cpp (+5-1)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+1)
  • (modified) bolt/lib/Core/Exceptions.cpp (+18-2)
  • (modified) bolt/lib/Core/MCPlusBuilder.cpp (+86)
  • (modified) bolt/lib/Passes/CMakeLists.txt (+2)
  • (added) bolt/lib/Passes/InsertNegateRAStatePass.cpp (+153)
  • (added) bolt/lib/Passes/MarkRAStates.cpp (+122)
  • (modified) bolt/lib/Rewrite/BinaryPassManager.cpp (+6)
  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+6)
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index 7560908c250c3a..a889457256f90f 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -1599,6 +1599,52 @@ class BinaryFunction {
 
   void setHasInferredProfile(bool Inferred) { HasInferredProfile = Inferred; }
 
+  /// Find corrected offset the same way addCFIInstruction does it to skip NOPs.
+  std::optional<uint64_t> getCorrectedCFIOffset(uint64_t Offset) {
+    assert(!Instructions.empty());
+    auto I = Instructions.lower_bound(Offset);
+    if (Offset == getSize()) {
+      assert(I == Instructions.end() && "unexpected iterator value");
+      // Sometimes compiler issues restore_state after all instructions
+      // in the function (even after nop).
+      --I;
+      Offset = I->first;
+    }
+    assert(I->first == Offset && "CFI pointing to unknown instruction");
+    if (I == Instructions.begin()) {
+      return {};
+    }
+
+    --I;
+    while (I != Instructions.begin() && BC.MIB->isNoop(I->second)) {
+      Offset = I->first;
+      --I;
+    }
+    return Offset;
+  }
+
+  void setInstModifiesRAState(uint8_t CFIOpcode, uint64_t Offset) {
+    std::optional<uint64_t> CorrectedOffset = getCorrectedCFIOffset(Offset);
+    if (CorrectedOffset) {
+      auto I = Instructions.lower_bound(*CorrectedOffset);
+      I--;
+
+      switch (CFIOpcode) {
+      case dwarf::DW_CFA_GNU_window_save:
+        BC.MIB->setNegateRAState(I->second);
+        break;
+      case dwarf::DW_CFA_remember_state:
+        BC.MIB->setRememberState(I->second);
+        break;
+      case dwarf::DW_CFA_restore_state:
+        BC.MIB->setRestoreState(I->second);
+        break;
+      default:
+        assert(0 && "CFI Opcode not covered by function");
+      }
+    }
+  }
+
   void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
     assert(!Instructions.empty());
 
diff --git a/bolt/include/bolt/Core/MCPlus.h b/bolt/include/bolt/Core/MCPlus.h
index 601d709712864e..78387ad8f5b984 100644
--- a/bolt/include/bolt/Core/MCPlus.h
+++ b/bolt/include/bolt/Core/MCPlus.h
@@ -72,7 +72,15 @@ class MCAnnotation {
     kLabel,               /// MCSymbol pointing to this instruction.
     kSize,                /// Size of the instruction.
     kDynamicBranch,       /// Jit instruction patched at runtime.
-    kGeneric              /// First generic annotation.
+    kUnkownSign,          /// Signed state not determined yet
+    kSigning,             /// Inst is a signing instruction (paciasp, etc.)
+    kSigned,              /// Inst is in a range where RA is signed
+    kAuthenticating,      /// Authenticating inst (e.g. autiasp)
+    kUnsigned,            /// Inst is in a range where RA is unsigned
+    kRememberState,       /// Inst has rememberState CFI
+    kRestoreState,        /// Inst has restoreState CFI
+    kNegateState,         /// Inst has OpNegateRAState CFI
+    kGeneric,             /// First generic annotation.
   };
 
   virtual void print(raw_ostream &OS) const = 0;
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 3634fed9757ceb..f9fdfd3f63d3fb 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -66,6 +66,20 @@ class MCPlusBuilder {
 public:
   using AllocatorIdTy = uint16_t;
 
+  std::optional<int64_t> getAnnotationAtOpIndex(const MCInst &Inst,
+                                                unsigned OpIndex) const {
+    std::optional<unsigned> FirstAnnotationOp = getFirstAnnotationOpIndex(Inst);
+    if (!FirstAnnotationOp)
+      return std::nullopt;
+
+    if (*FirstAnnotationOp > OpIndex || Inst.getNumOperands() < OpIndex)
+      return std::nullopt;
+
+    auto Op = Inst.begin() + OpIndex;
+    const int64_t ImmValue = Op->getImm();
+    return extractAnnotationIndex(ImmValue);
+  }
+
 private:
   /// A struct that represents a single annotation allocator
   struct AnnotationAllocator {
@@ -648,6 +662,13 @@ class MCPlusBuilder {
     llvm_unreachable("not implemented");
     return false;
   }
+  virtual bool isPAuth(MCInst &Inst) const {
+    llvm_unreachable("not implemented");
+  }
+
+  virtual bool isPSign(MCInst &Inst) const {
+    llvm_unreachable("not implemented");
+  }
 
   virtual bool isCleanRegXOR(const MCInst &Inst) const {
     llvm_unreachable("not implemented");
@@ -1122,6 +1143,51 @@ class MCPlusBuilder {
   /// Return true if the instruction is a tail call.
   bool isTailCall(const MCInst &Inst) const;
 
+  /// Stores NegateRAState annotation on Inst.
+  void setNegateRAState(MCInst &Inst) const;
+
+  /// Return true if Inst has NegateRAState annotation.
+  bool hasNegateRAState(const MCInst &Inst) const;
+
+  /// Sets RememberState annotation on Inst.
+  void setRememberState(MCInst &Inst) const;
+
+  /// Return true if Inst has RememberState annotation.
+  bool hasRememberState(const MCInst &Inst) const;
+
+  /// Stores RestoreState annotation on Inst.
+  void setRestoreState(MCInst &Inst) const;
+
+  /// Return true if Inst has RestoreState annotation.
+  bool hasRestoreState(const MCInst &Inst) const;
+
+  /// Stores RA Signed annotation on Inst.
+  void setRASigned(MCInst &Inst) const;
+
+  /// Return true if Inst has Signed RA annotation.
+  bool isRASigned(const MCInst &Inst) const;
+
+  /// Stores RA Signing annotation on Inst.
+  void setRASigning(MCInst &Inst) const;
+
+  /// Return true if Inst has Signing RA annotation.
+  bool isRASigning(const MCInst &Inst) const;
+
+  /// Stores Authenticating annotation on Inst.
+  void setAuthenticating(MCInst &Inst) const;
+
+  /// Return true if Inst has Authenticating annotation.
+  bool isAuthenticating(const MCInst &Inst) const;
+
+  /// Stores RA Unsigned annotation on Inst.
+  void setRAUnsigned(MCInst &Inst) const;
+
+  /// Return true if Inst has Unsigned RA annotation.
+  bool isRAUnsigned(const MCInst &Inst) const;
+
+  /// Return true if Inst doesn't have any annotation related to RA state.
+  bool isRAStateUnknown(const MCInst &Inst) const;
+
   /// Return true if the instruction is a call with an exception handling info.
   virtual bool isInvoke(const MCInst &Inst) const {
     return isCall(Inst) && getEHInfo(Inst);
diff --git a/bolt/include/bolt/Passes/InsertNegateRAStatePass.h b/bolt/include/bolt/Passes/InsertNegateRAStatePass.h
new file mode 100644
index 00000000000000..8cf08add9402a3
--- /dev/null
+++ b/bolt/include/bolt/Passes/InsertNegateRAStatePass.h
@@ -0,0 +1,25 @@
+#ifndef BOLT_PASSES_INSERT_NEGATE_RA_STATE_PASS
+#define BOLT_PASSES_INSERT_NEGATE_RA_STATE_PASS
+
+#include "bolt/Passes/BinaryPasses.h"
+#include <stack>
+
+namespace llvm {
+namespace bolt {
+
+class InsertNegateRAState : public BinaryFunctionPass {
+public:
+  explicit InsertNegateRAState() : BinaryFunctionPass(false) {}
+
+  const char *getName() const override { return "insert-negate-ra-state-pass"; }
+
+  /// Pass entry point
+  Error runOnFunctions(BinaryContext &BC) override;
+  void runOnFunction(BinaryFunction &BF);
+  bool addNegateRAStateAfterPacOrAuth(BinaryFunction &BF);
+  void fixUnknownStates(BinaryFunction &BF);
+};
+
+} // namespace bolt
+} // namespace llvm
+#endif
diff --git a/bolt/include/bolt/Passes/MarkRAStates.h b/bolt/include/bolt/Passes/MarkRAStates.h
new file mode 100644
index 00000000000000..3cbd6044683da4
--- /dev/null
+++ b/bolt/include/bolt/Passes/MarkRAStates.h
@@ -0,0 +1,22 @@
+#ifndef BOLT_PASSES_MARK_RA_STATES
+#define BOLT_PASSES_MARK_RA_STATES
+
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+namespace bolt {
+
+class MarkRAStates : public BinaryFunctionPass {
+public:
+  explicit MarkRAStates() : BinaryFunctionPass(false) {}
+
+  const char *getName() const override { return "mark-ra-states"; }
+
+  /// Pass entry point
+  Error runOnFunctions(BinaryContext &BC) override;
+  void runOnFunction(BinaryFunction &BF);
+};
+
+} // namespace bolt
+} // namespace llvm
+#endif
diff --git a/bolt/lib/Core/BinaryBasicBlock.cpp b/bolt/lib/Core/BinaryBasicBlock.cpp
index 2a2192b79bb4bf..58a5f8e3ae4c08 100644
--- a/bolt/lib/Core/BinaryBasicBlock.cpp
+++ b/bolt/lib/Core/BinaryBasicBlock.cpp
@@ -201,7 +201,11 @@ int32_t BinaryBasicBlock::getCFIStateAtInstr(const MCInst *Instr) const {
       InstrSeen = (&Inst == Instr);
       continue;
     }
-    if (Function->getBinaryContext().MIB->isCFI(Inst)) {
+    // Ignoring OpNegateRAState CFIs here, as they dont have a "State"
+    // number associated with them.
+    if (Function->getBinaryContext().MIB->isCFI(Inst) &&
+        (Function->getCFIFor(Inst)->getOperation() !=
+         MCCFIInstruction::OpNegateRAState)) {
       LastCFI = &Inst;
       break;
     }
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index a9ccaea3c43850..9c4e161f66ad45 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -2587,6 +2587,7 @@ struct CFISnapshot {
   void advanceTo(int32_t State) {
     for (int32_t I = CurState, E = State; I != E; ++I) {
       const MCCFIInstruction &Instr = FDE[I];
+      assert(Instr.getOperation() != MCCFIInstruction::OpNegateRAState);
       if (Instr.getOperation() != MCCFIInstruction::OpRestoreState) {
         update(Instr, I);
         continue;
diff --git a/bolt/lib/Core/Exceptions.cpp b/bolt/lib/Core/Exceptions.cpp
index 0b2e63b8ca6a79..d38b37bcb3c612 100644
--- a/bolt/lib/Core/Exceptions.cpp
+++ b/bolt/lib/Core/Exceptions.cpp
@@ -568,10 +568,21 @@ bool CFIReaderWriter::fillCFIInfoFor(BinaryFunction &Function) const {
     case DW_CFA_remember_state:
       Function.addCFIInstruction(
           Offset, MCCFIInstruction::createRememberState(nullptr));
+
+      if (Function.getBinaryContext().isAArch64())
+        // Support for pointer authentication:
+        // We need to annotate instructions that modify the RA State, to work
+        // out the state of each instruction in MarkRAStates Pass.
+        Function.setInstModifiesRAState(DW_CFA_remember_state, Offset);
       break;
     case DW_CFA_restore_state:
       Function.addCFIInstruction(Offset,
                                  MCCFIInstruction::createRestoreState(nullptr));
+      if (Function.getBinaryContext().isAArch64())
+        // Support for pointer authentication:
+        // We need to annotate instructions that modify the RA State, to work
+        // out the state of each instruction in MarkRAStates Pass.
+        Function.setInstModifiesRAState(DW_CFA_restore_state, Offset);
       break;
     case DW_CFA_def_cfa:
       Function.addCFIInstruction(
@@ -632,8 +643,13 @@ bool CFIReaderWriter::fillCFIInfoFor(BinaryFunction &Function) const {
       // DW_CFA_GNU_window_save and DW_CFA_GNU_NegateRAState just use the same
       // id but mean different things. The latter is used in AArch64.
       if (Function.getBinaryContext().isAArch64()) {
-        Function.addCFIInstruction(
-            Offset, MCCFIInstruction::createNegateRAState(nullptr));
+        // Not adding OpNegateRAState since the location they are needed
+        // depends on the order of BasicBlocks, which changes during
+        // optimizations. Instead, an annotation is added to the instruction, to
+        // mark that the instruction modifies the RA State. The actual state for
+        // instructions are worked out in MarkRAStates based on these
+        // annotations.
+        Function.setInstModifiesRAState(DW_CFA_GNU_window_save, Offset);
         break;
       }
       if (opts::Verbosity >= 1)
diff --git a/bolt/lib/Core/MCPlusBuilder.cpp b/bolt/lib/Core/MCPlusBuilder.cpp
index 7ff7a2288451c8..7d172b1ade5e8c 100644
--- a/bolt/lib/Core/MCPlusBuilder.cpp
+++ b/bolt/lib/Core/MCPlusBuilder.cpp
@@ -147,6 +147,92 @@ bool MCPlusBuilder::isTailCall(const MCInst &Inst) const {
   return false;
 }
 
+void MCPlusBuilder::setNegateRAState(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kNegateState));
+  setAnnotationOpValue(Inst, MCAnnotation::kNegateState, true);
+}
+
+bool MCPlusBuilder::hasNegateRAState(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kNegateState))
+    return true;
+  return false;
+}
+
+void MCPlusBuilder::setRememberState(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kRememberState));
+  setAnnotationOpValue(Inst, MCAnnotation::kRememberState, true);
+}
+
+bool MCPlusBuilder::hasRememberState(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kRememberState))
+    return true;
+  return false;
+}
+
+void MCPlusBuilder::setRestoreState(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kRestoreState));
+  setAnnotationOpValue(Inst, MCAnnotation::kRestoreState, true);
+}
+
+bool MCPlusBuilder::hasRestoreState(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kRestoreState))
+    return true;
+  return false;
+}
+
+void MCPlusBuilder::setRASigned(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kSigned));
+  setAnnotationOpValue(Inst, MCAnnotation::kSigned, true);
+}
+
+bool MCPlusBuilder::isRASigned(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kSigned))
+    return true;
+  return false;
+}
+
+void MCPlusBuilder::setRASigning(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kSigning));
+  setAnnotationOpValue(Inst, MCAnnotation::kSigning, true);
+}
+
+bool MCPlusBuilder::isRASigning(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kSigning))
+    return true;
+  return false;
+}
+
+void MCPlusBuilder::setAuthenticating(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kAuthenticating));
+  setAnnotationOpValue(Inst, MCAnnotation::kAuthenticating, true);
+}
+
+bool MCPlusBuilder::isAuthenticating(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kAuthenticating))
+    return true;
+  return false;
+}
+
+void MCPlusBuilder::setRAUnsigned(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kUnsigned));
+  setAnnotationOpValue(Inst, MCAnnotation::kUnsigned, true);
+}
+
+bool MCPlusBuilder::isRAUnsigned(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kUnsigned))
+    return true;
+  return false;
+}
+
+bool MCPlusBuilder::isRAStateUnknown(const MCInst &Inst) const {
+  if (hasAnnotation(Inst, MCAnnotation::kUnsigned) ||
+      hasAnnotation(Inst, MCAnnotation::kSigned) ||
+      hasAnnotation(Inst, MCAnnotation::kSigning) ||
+      hasAnnotation(Inst, MCAnnotation::kAuthenticating))
+    return false;
+  return true;
+}
+
 std::optional<MCLandingPad> MCPlusBuilder::getEHInfo(const MCInst &Inst) const {
   if (!isCall(Inst))
     return std::nullopt;
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 1c1273b3d2420d..330c9136515bad 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -17,12 +17,14 @@ add_llvm_library(LLVMBOLTPasses
   IdenticalCodeFolding.cpp
   IndirectCallPromotion.cpp
   Inliner.cpp
+  InsertNegateRAStatePass.cpp
   Instrumentation.cpp
   JTFootprintReduction.cpp
   LongJmp.cpp
   LoopInversionPass.cpp
   LivenessAnalysis.cpp
   MCF.cpp
+  MarkRAStates.cpp
   PatchEntries.cpp
   PettisAndHansen.cpp
   PLTCall.cpp
diff --git a/bolt/lib/Passes/InsertNegateRAStatePass.cpp b/bolt/lib/Passes/InsertNegateRAStatePass.cpp
new file mode 100644
index 00000000000000..b0945e4065f039
--- /dev/null
+++ b/bolt/lib/Passes/InsertNegateRAStatePass.cpp
@@ -0,0 +1,153 @@
+//===- bolt/Passes/InsertNegateRAStatePass.cpp ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the InsertNegateRAStatePass class. It inserts
+// OpNegateRAState CFIs to places where the state of two consecutive
+// instructions are different.
+//
+//===----------------------------------------------------------------------===//
+#include "bolt/Passes/InsertNegateRAStatePass.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Core/ParallelUtilities.h"
+#include "bolt/Utils/CommandLineOpts.h"
+#include <cstdlib>
+#include <fstream>
+#include <iterator>
+
+using namespace llvm;
+
+namespace llvm {
+namespace bolt {
+
+void InsertNegateRAState::runOnFunction(BinaryFunction &BF) {
+  BinaryContext &BC = BF.getBinaryContext();
+
+  if (BF.getState() == BinaryFunction::State::Empty) {
+    return;
+  }
+
+  if (BF.getState() != BinaryFunction::State::CFG &&
+      BF.getState() != BinaryFunction::State::CFG_Finalized) {
+    BC.outs() << "BOLT-INFO: No CFG for " << BF.getPrintName()
+              << " in InsertNegateRAStatePass\n";
+    return;
+  }
+
+  // If none is inserted, the function doesn't need more work.
+  if (!addNegateRAStateAfterPacOrAuth(BF)) {
+    BC.outs() << "BOLT-INFO: no pacret found.\n";
+    return;
+  }
+
+  fixUnknownStates(BF);
+
+  bool FirstIter = true;
+  MCInst PrevInst;
+  BinaryBasicBlock *PrevBB = nullptr;
+  auto *Begin = BF.getLayout().block_begin();
+  auto *End = BF.getLayout().block_end();
+  for (auto *BB = Begin; BB != End; BB++) {
+
+    // Support for function splitting:
+    // if two consecutive BBs are going to end up in different functions,
+    // we have to negate the RA State, so the new function starts with a Signed
+    // state.
+    if (PrevBB != nullptr &&
+        PrevBB->getFragmentNum() != (*BB)->getFragmentNum() &&
+        BC.MIB->isRASigned(*((*BB)->begin()))) {
+      BF.addCFIInstruction(*BB, (*BB)->begin(),
+                           MCCFIInstruction::createNegateRAState(nullptr));
+    }
+
+    for (auto It = (*BB)->begin(); It != (*BB)->end(); ++It) {
+
+      MCInst &Inst = *It;
+      if (BC.MIB->isCFI(Inst))
+        continue;
+
+      if (!FirstIter) {
+        if ((BC.MIB->isRASigned(PrevInst) && BC.MIB->isRAUnsigned(Inst)) ||
+            (BC.MIB->isRAUnsigned(PrevInst) && BC.MIB->isRASigned(Inst))) {
+
+          It = BF.addCFIInstruction(
+              *BB, It, MCCFIInstruction::createNegateRAState(nullptr));
+        }
+
+      } else {
+        FirstIter = false;
+      }
+      PrevInst = Inst;
+    }
+    PrevBB = *BB;
+  }
+}
+
+bool InsertNegateRAState::addNegateRAStateAfterPacOrAuth(BinaryFunction &BF) {
+  BinaryContext &BC = BF.getBinaryContext();
+  bool FoundAny = false;
+  for (BinaryBasicBlock &BB : BF) {
+    for (auto Iter = BB.begin(); Iter != BB.end(); ++Iter) {
+      MCInst &Inst = *Iter;
+      if (BC.MIB->isPSign(Inst)) {
+        Iter = BF.addCFIInstruction(
+            &BB, Iter + 1, MCCFIInstruction::createNegateRAState(nullptr));
+        FoundAny = true;
+      }
+
+      if (BC.MIB->isPAuth(Inst)) {
+        Iter = BF.addCFIInstruction(
+            &BB, Iter + 1, MCCFIInstruction::createNegateRAState(nullptr));
+        FoundAny = true;
+      }
+    }
+  }
+  return FoundAny;
+}
+
+void InsertNegateRAState::fixUnknownStates(BinaryFunction &BF) {
+  BinaryContext &BC = BF.getBinaryContext();
+  bool FirstIter = true;
+  MCInst PrevInst;
+  for (auto BBIt = BF.begin(); BBIt != BF.end(); ++BBIt) {
+    BinaryBasicBlock &BB = *BBIt;
+
+    for (auto It = BB.begin(); It != BB.end(); ++It) {
+
+      MCInst &Inst = *It;
+      if (BC.MIB->isCFI(Inst))
+        continue;
+
+      if (!FirstIter && BC.MIB->isRAStateUnknown(Inst)) {
+        if (BC.MIB->isRASigned(PrevInst) || BC.MIB->isRASigning(PrevInst)) {
+          BC.MIB->setRASigned(Inst);
+        } else if (BC.MIB->isRAUnsigned(PrevInst) ||
+                   BC.MIB->isAuthenticating(PrevInst)) {
+          BC.MIB->setRAUnsigned(Inst);
+        }
+      } else {
+        FirstIter = false;
+      }
+      PrevInst = Inst;
+    }
+  }
+}
+
+Error InsertNegateRAState::runOnFunctions(BinaryContext &BC) {
+  ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
+    runOnFunction(BF);
+  };
+
+  ParallelUtilities::runOnEachFunction(
+      BC, ParallelUtilities::SchedulingPolicy::SP_TRIVIAL, WorkFun, nullptr,
+      "InsertNegateRAStatePass");
+
+  return Error::success();
+}
+
+} // end namespace bolt
+} // end namespace llvm
diff --git a/bolt/lib/Passes/MarkRAStates.cpp b/bolt/lib/Passes/MarkRAStates.cpp
new file mode 100644
index 00000000000000..03affab2f21546
--- /dev/null
+++ b/bolt/lib/Passes/MarkRAS...
[truncated]

Copy link
Collaborator

@kbeyls kbeyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this patch @bgergely0 !

I haven't reviewed in depth (and might not be the best person to do an in-depth review), but I thought I'd share my initial thoughts after seeing this patch.

I think the PR description is excellent, it explains the overall design of how to correctly create/transform .cfi_negate_ra_state well.
I think it would be best that that description would live not only in the commit message, but ideally in some "design doc" area for BOLT. I don't think there's a "design doc" area for BOLT at the moment (@maksfb , is that right?), so probably best to try and find an appropriate place in the source code to store the info as a comment?

My one other high-level observation is that I'm wondering if it isn't too costly somehow to have many different kinds of MCAnnotation for this one use case. I wonder if it would be better/cheaper to have only a single annotation that would represent "OpNegateRAStateInfo", rather than having separate annotations. (Do I guess correctly that with this patch, typically, there will be a few annotations added to each MCInst?). Maybe some of the long-term BOLT developer might have an informed opinion on that?

@bgergely0
Copy link
Author

bgergely0 commented Jan 10, 2025

Thanks for the feedback @kbeyls!

Doc

I agree that storing an in-depth description would be nice. I added comments to key parts, so the sources contain a more concise description of the problem/solution, but these are scattered in different files.

MCAnnotations

I'm not sure how a single annotation would work. As the MCAnnotation is essentially an enum, I cannot represent the different states by one enum element.

As far as I know (and backed up by https://godbolt.org/z/fv87j3vod ) even though the 4 element enum could be described in 2 bits (or in a single byte), a color option takes up 4 bytes. So I think we have plenty of space for adding new MCAnnotation options.

The runtime memory footprint would increase a bit, because in MarkRAState Pass, the state of each instruction is annotated, so an extra annotation is stored on each MCInst, but according to my bytehound measurements, it increases the memory by less than 3%. And this only applies in the case of pac-ret binaries, which were not supported before, so the comparison is not very meaningful. And because the size of an MCAnnotation does not increase, users running BOLT on non pac-ret binaries would not see a difference in runtime memory consumption.

Thanks for pointing this out though, I just realized I left the kUnknownSign in, which is not used at all.

@bgergely0 bgergely0 requested a review from yota9 as a code owner January 10, 2025 13:52
Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Gergely, thanks a lot for your patch. It is very well written!

The change has been tested on unittests binaries from the Chromium project, namely zlib_unittests and base_unittests.

It's great that you tested this externally with Chromium.
My high-level comment is about bolt testing. Any plans or thoughts about it?

You may also find some more comments below, mostly 'nits'. Thanks again!

};

ParallelUtilities::runOnEachFunction(
BC, ParallelUtilities::SchedulingPolicy::SP_TRIVIAL, WorkFun, nullptr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the policy chosen based on your discussion with @kbeyls about costs?
Unsure what is more appropriate here. At least on theoretical time complexity you do a worst-case 2x linear pass over instructions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just copied this from some other Pass I believe, and forgot to make a proper decision. Do you have a suggestion to which one should I use?

Copy link
Member

@paschalis-mpeis paschalis-mpeis Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be SP_INST_LINEAR instead (same for InsertNegateRAStatePass.cpp) since you iterate on instructions. @maksfb , @yota9 any comments on this?

@whousemyname
Copy link

whousemyname commented Jan 15, 2025

Hello bgergely0, This PR is really great!!
But when I used this PR, I instrumented my lib.so (aarch64) file, but the bolt crashed. It was related to the hash of the relocation information. I don’t know if it was because of the duplication of my lib.so file. Is it related to positioning information or PR?
The command executed is llvm-bolt.exe libGG.debug.so -instrument --instrumentation-file=GG.fdata -o .\libGG-debug.so.instrumented

BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: dac8f161e868f57e34c5a12bc81c939a64322d15
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x4000000, offset 0x4000000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-bolt.exe libGG.debug.so -instrument --instrumentation-file=GG.fdata -o .\libGG-debug.so.instrumented
Exception Code: 0x80000003
 #0 0x00007ff7dee0b653 std::_Tree_const_iterator<class std::_Tree_val<struct std::_Tree_simple_types<struct llvm::bolt::Relocation>>>::operator*(void) const C:\install_vs_buildtools\VC\Tools\MSVC\14.42.34433\include\xtree:182:0
 #1 0x00007ff7dff570f4 std::_Tree_const_iterator<class std::_Tree_val<struct std::_Tree_simple_types<struct llvm::bolt::Relocation>>>::operator->(void) const C:\install_vs_buildtools\VC\Tools\MSVC\14.42.34433\include\xtree:189:0
 #2 0x00007ff7dff5134c llvm::bolt::BinarySection::hash(class llvm::bolt::BinaryData const &, class std::map<class llvm::bolt::BinaryData const *, unsigned __int64, struct std::less<class llvm::bolt::BinaryData const *>, class std::allocator<struct std::pair<class llvm::bolt::BinaryData const *const, unsigned __int64>>> &) const D:\github-projects\llvm-project\bolt\lib\Core\BinarySection.cpp:58:0
 #3 0x00007ff7dffc4bf4 llvm::bolt::BinarySection::hash(class llvm::bolt::BinaryData const &) const D:\github-projects\llvm-project\bolt\include\bolt\Core\BinarySection.h:429:0
 #4 0x00007ff7dff64c47 llvm::bolt::BinaryContext::generateSymbolHashes(void) D:\github-projects\llvm-project\bolt\lib\Core\BinaryContext.cpp:1186:0
 #5 0x00007ff7dff6d11a llvm::bolt::BinaryContext::postProcessSymbolTable(void) D:\github-projects\llvm-project\bolt\lib\Core\BinaryContext.cpp:1414:0
 #6 0x00007ff7ded4a796 llvm::bolt::RewriteInstance::buildFunctionsCFG(void) D:\github-projects\llvm-project\bolt\lib\Rewrite\RewriteInstance.cpp:3431:0
 #7 0x00007ff7ded3936e llvm::bolt::RewriteInstance::run(void) D:\github-projects\llvm-project\bolt\lib\Rewrite\RewriteInstance.cpp:688:0
 #8 0x00007ff7dd07eb8b main D:\github-projects\llvm-project\bolt\tools\driver\llvm-bolt.cpp:268:0
 #9 0x00007ff7e27221c9 invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79:0
#10 0x00007ff7e27220b2 __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288:0
#11 0x00007ff7e2721f6e __scrt_common_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331:0
#12 0x00007ff7e272225e mainCRTStartup D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17:0
#13 0x00007ffac8cd259d (C:\WINDOWS\System32\KERNEL32.DLL+0x1259d)
#14 0x00007ffaca30af38 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x5af38)

@bgergely0
Copy link
Author

@whousemyname Hi! Pretty sure this code has nothing to do with the BinarySection::hash function. Until this PR is merged, and considered "stable", I suggest to only use executables compiled with -mbranch-protection=none with BOLT, and use the code from the main branch.

@bgergely0
Copy link
Author

Thanks for the detailed look at it @paschalis-mpeis !
I addressed most of your comments. The scheduling policy is a missing bit.

Testing:
We definately should add some tests next to the code. Problem is that the code needs a faily complex test (e.g. it needs to rely on an unwinder). So the smallest possible target we could (re)use is tests from the libunwind subproject. This file for example.

@bgergely0
Copy link
Author

I will have a small improvement up in a few days to add some error recovery capability to the asserts here.
In MarkRAStates, I assume that functions start in an unsigned RA state, and that's why these asserts are in place. I have encountered one situation with handwritten assembly, where this is not be the case: a pair of paciasp and autiasp insts were split into two functions. To not fail on these asserts, I wrote this hacky workaround in MarkRAStates.cpp.
But after submitting, I now realize that this situation is analogous to the normal function-splitting done during optimizations (so a function is split, and the paciasp and autiasp end up in different functions). So some level or error-recovery from the asserts is possible, which would make the code cleaner, and less likely to fail on the asserts.

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments @bgergely0!

I've added a few more nits. Let's wait on the scheduling policy and feel free to reach out regarding tests.

@paschalis-mpeis
Copy link
Member

@whousemyname you could double-check by trying the first commit before this PR (3063eaf85~) that it still has the crash you mention.

@whousemyname
Copy link

@whousemyname you could double-check by trying the first commit before this PR (3063eaf85~) that it still has the crash you mention.

Thank you for your suggestion. The cause of this crash should have existed before. The llvm-bolt I built on mac did not encounter this problem, but the llvm-bolt built using msvc on Windows encountered this hash crash problem.

@paschalis-mpeis
Copy link
Member

The change has been tested on unittests binaries from the Chromium project, namely zlib_unittests and base_unittests.

Regarding some end-to-end testing for PAuth, like with zlib in Chromium, we could maybe add some small artifact to rafaelauler/bolt-tests
(cc: @ilinpv)

    instead of failing on asserts, setIgnored() to functions
    with inconsistent input RA States
@ostannard
Copy link
Collaborator

I'm not familiar with the internals of BOLT, but this looks like a reasonable strategy for handling return address signing. The only comments I have should be taken as suggestions for future patches, and probably don't need to be handled before this can be committed.

When compiling with -march=armv8.3-a (or later, or a later -mcpu option), there are new instructions which get used, instead of the NOP instructions this checks for. I think we always use the NOP instructions (PACIxSP) in the prologue, but we emit the new v8.3-A RETAx instructions for returns.

The compiler can also emit the 'B' character into the CIE augmentation string to signal that the B key (instead of the default A) was used for return address signing. This can be enabled in the compiler with -mbranch-protection=pac-ret+b-key or -fptrauth-returns. I don't think any platforms mix the two keys for return address signing in one process, but the unwinder does use the augmentation string to decide which authentication instruction to use, so if we create new CFIs then we'll need to make sure that gets set correctly.

Armv9.5-A added a new extension to pointer authentication, FEAT_PAUTH_LR, which adds the PC address of the signing instruction to the signature. There is a new unwind opcode for this, DW_CFA_AARCH64_negate_ra_state_with_pc, which toggles the signing state and captures the code address to be used for authentication during unwinding. There are a few different forms of the epilogue, depending on both the architecture version and the length of the function, and BOLT will need to be able to adjust these to get the correct address. This also adds the extra constraint that there can only be one signing instruction which dominates each authenticating instruction, I don't know if any BOLT optimisations are likely to break that or not.

@Heart-tide
Copy link

Hi @bgergely0 . Thank you for your PR! It works well in most scenarios. However, while testing on vmlinux, I occasionally encountered a segmentation fault at InsertNegateRAStatePass.cpp:81.

Observations:

  • The crash occurs at PrevInst = Inst after BF.addCFIInstruction() is called.
  • When I modified the code to use a local copy of *It (instead of a reference MCInst& Inst), the crash no longer reproduced.

Hypothesis:
The vector::insert operation within addCFIInstruction() may invalidate existing iterators/references (per C++ iterator invalidation rules). This could turn Inst into a dangling reference if the vector reallocates, leading to UB when accessed later.

Since the issue is non-deterministic (depends on reallocation), it might explain the intermittent crashes. Would you please review the iterator/reference handling around the CFI insertion point? Thank you for your time!

@bgergely0
Copy link
Author

Hi @Heart-tide, thanks for reporting this! Your observations seems correct: the It = BF.addCFIInstruction() updates the iterator, but the Inst variable is still a reference to the original one. I will fix this soon. Let me know if you find any other issues with the patch!

@bgergely0 bgergely0 marked this pull request as draft March 8, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants