Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Account for existing SDWA selections #123221

Merged
merged 48 commits into from
Mar 3, 2025

Conversation

frederik-h
Copy link
Contributor

@frederik-h frederik-h commented Jan 16, 2025

The si-peephole-sdwa pass adjusts the selections on sdwa instructions to the selections on their operands during its conversions. For instance, if an instruction selects BYTE_0 and its operand selects WORD_1, the combined selection should be BYTE_2, i.e. "BYTE_0 of WORD_1". The existing implementation does not always handle this correctly in some complex situations with instructions across different basic blocks as demonstrated by the test cases included in this PR.

This PR adds an additional selection combination step to the conversion to fix this issue. It reverts the changes made by PR #123942 which had disabled the conversion of preexisting SDWA instructions completely as a quick fix.

jrbyrnes and others added 10 commits January 16, 2025 08:26
    Change-Id: I3e1cf6042f069e8dffe9dd5b4654288111f7b1bf
- Remove redundant "if".
- Replace arithmetic on SdwaSel type

The case distinction seems clearer and removes a mishandled case:
Since (SdwaSel)((unsigned)WORD_0 + 2) == DWORD,
the existing code led to the transformation:
     WORD_0 Sel (WORD_1 Sel (%X)) -> DWORD Sel (%X)
The correct transformation should be:
    WORD_0 Sel (WORD_1 Sel (%X)) -> WORD_1 Sel (%X)
There are  two loops that invoke the conversion on the operands
of the input instruction, one for the case where the instruction
is already an SDWA instruction and one for the case where it isn't.
The loops are almost the same.

Fuse those loops into a single loop.
@frederik-h frederik-h requested a review from jrbyrnes January 16, 2025 17:04
Copy link

github-actions bot commented Jan 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/123221.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp (+107-36)
  • (added) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir (+124)
diff --git a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
index 467f042892cebe..713ef162f8dee5 100644
--- a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
@@ -62,6 +62,7 @@ class SIPeepholeSDWA {
   std::unique_ptr<SDWAOperand> matchSDWAOperand(MachineInstr &MI);
   void pseudoOpConvertToVOP2(MachineInstr &MI,
                              const GCNSubtarget &ST) const;
+  MachineInstr *createSDWAVersion(MachineInstr &MI);
   bool convertToSDWA(MachineInstr &MI, const SDWAOperandsVector &SDWAOperands);
   void legalizeScalarOperands(MachineInstr &MI, const GCNSubtarget &ST) const;
 
@@ -85,6 +86,8 @@ class SIPeepholeSDWALegacy : public MachineFunctionPass {
   }
 };
 
+using namespace AMDGPU::SDWA;
+
 class SDWAOperand {
 private:
   MachineOperand *Target; // Operand that would be used in converted instruction
@@ -102,12 +105,47 @@ class SDWAOperand {
   virtual MachineInstr *potentialToConvert(const SIInstrInfo *TII,
                                            const GCNSubtarget &ST,
                                            SDWAOperandsMap *PotentialMatches = nullptr) = 0;
-  virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) = 0;
+  virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                             bool CombineSelections = false) = 0;
 
   MachineOperand *getTargetOperand() const { return Target; }
   MachineOperand *getReplacedOperand() const { return Replaced; }
   MachineInstr *getParentInst() const { return Target->getParent(); }
 
+  /// Fold a \p FoldedOp SDWA selection into an \p ExistingOp existing SDWA
+  /// selection. If the selections are compatible, return the combined
+  /// selection, otherwise return a nullopt. For example, if we have existing
+  /// BYTE_0 Sel and are attempting to fold WORD_1 Sel:
+  ///     BYTE_0 Sel (WORD_1 Sel (%X)) -> BYTE_2 Sel (%X)
+  std::optional<SdwaSel> combineSdwaSel(SdwaSel ExistingOp, SdwaSel FoldedOp) {
+    if (ExistingOp == SdwaSel::DWORD)
+      return FoldedOp;
+
+    if (FoldedOp == SdwaSel::DWORD)
+      return ExistingOp;
+
+    if (ExistingOp == SdwaSel::WORD_1 || ExistingOp == SdwaSel::BYTE_2 ||
+        ExistingOp == SdwaSel::BYTE_3)
+      return {};
+
+    if (ExistingOp == FoldedOp)
+      return ExistingOp;
+
+    if (FoldedOp == SdwaSel::WORD_0)
+      return ExistingOp;
+
+    if (FoldedOp == SdwaSel::WORD_1) {
+      if (ExistingOp == SdwaSel::BYTE_0)
+        return SdwaSel::BYTE_2;
+      if (ExistingOp == SdwaSel::BYTE_1)
+        return SdwaSel::BYTE_3;
+      if (ExistingOp == SdwaSel::WORD_0)
+        return SdwaSel::WORD_1;
+    }
+
+    return {};
+  }
+
   MachineRegisterInfo *getMRI() const {
     return &getParentInst()->getParent()->getParent()->getRegInfo();
   }
@@ -118,8 +156,6 @@ class SDWAOperand {
 #endif
 };
 
-using namespace AMDGPU::SDWA;
-
 class SDWASrcOperand : public SDWAOperand {
 private:
   SdwaSel SrcSel;
@@ -137,7 +173,8 @@ class SDWASrcOperand : public SDWAOperand {
   MachineInstr *potentialToConvert(const SIInstrInfo *TII,
                                    const GCNSubtarget &ST,
                                    SDWAOperandsMap *PotentialMatches = nullptr) override;
-  bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) override;
+  bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                     bool CombineSelections = false) override;
 
   SdwaSel getSrcSel() const { return SrcSel; }
   bool getAbs() const { return Abs; }
@@ -166,7 +203,8 @@ class SDWADstOperand : public SDWAOperand {
   MachineInstr *potentialToConvert(const SIInstrInfo *TII,
                                    const GCNSubtarget &ST,
                                    SDWAOperandsMap *PotentialMatches = nullptr) override;
-  bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) override;
+  bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                     bool CombineSelections = false) override;
 
   SdwaSel getDstSel() const { return DstSel; }
   DstUnused getDstUnused() const { return DstUn; }
@@ -186,7 +224,8 @@ class SDWADstPreserveOperand : public SDWADstOperand {
       : SDWADstOperand(TargetOp, ReplacedOp, DstSel_, UNUSED_PRESERVE),
         Preserve(PreserveOp) {}
 
-  bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) override;
+  bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                     bool CombineSelections = false) override;
 
   MachineOperand *getPreservedOperand() const { return Preserve; }
 
@@ -375,7 +414,8 @@ MachineInstr *SDWASrcOperand::potentialToConvert(const SIInstrInfo *TII,
   return PotentialMO->getParent();
 }
 
-bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
+bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                                   bool CombineSelections) {
   switch (MI.getOpcode()) {
   case AMDGPU::V_CVT_F32_FP8_sdwa:
   case AMDGPU::V_CVT_F32_BF8_sdwa:
@@ -451,7 +491,15 @@ bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
   }
   copyRegOperand(*Src, *getTargetOperand());
   if (!IsPreserveSrc) {
-    SrcSel->setImm(getSrcSel());
+    if (CombineSelections) {
+      std::optional<SdwaSel> NewOp =
+          combineSdwaSel((SdwaSel)SrcSel->getImm(), getSrcSel());
+      if (!NewOp.has_value())
+        return false;
+      SrcSel->setImm(NewOp.value());
+    } else {
+      SrcSel->setImm(getSrcSel());
+    }
     SrcMods->setImm(getSrcMods(TII, Src));
   }
   getTargetOperand()->setIsKill(false);
@@ -479,7 +527,8 @@ MachineInstr *SDWADstOperand::potentialToConvert(const SIInstrInfo *TII,
   return PotentialMO->getParent();
 }
 
-bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
+bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                                   bool CombineSelections) {
   // Replace vdst operand in MI with target operand. Set dst_sel and dst_unused
 
   if ((MI.getOpcode() == AMDGPU::V_FMAC_F16_sdwa ||
@@ -498,7 +547,15 @@ bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
   copyRegOperand(*Operand, *getTargetOperand());
   MachineOperand *DstSel= TII->getNamedOperand(MI, AMDGPU::OpName::dst_sel);
   assert(DstSel);
-  DstSel->setImm(getDstSel());
+  if (CombineSelections) {
+    std::optional<SdwaSel> NewOp =
+        combineSdwaSel((SdwaSel)DstSel->getImm(), getDstSel());
+    if (!NewOp.has_value())
+      return false;
+    DstSel->setImm(NewOp.value());
+  } else {
+    DstSel->setImm(getDstSel());
+  }
   MachineOperand *DstUnused= TII->getNamedOperand(MI, AMDGPU::OpName::dst_unused);
   assert(DstUnused);
   DstUnused->setImm(getDstUnused());
@@ -510,7 +567,8 @@ bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
 }
 
 bool SDWADstPreserveOperand::convertToSDWA(MachineInstr &MI,
-                                           const SIInstrInfo *TII) {
+                                           const SIInstrInfo *TII,
+                                           bool CombineSelections) {
   // MI should be moved right before v_or_b32.
   // For this we should clear all kill flags on uses of MI src-operands or else
   // we can encounter problem with use of killed operand.
@@ -535,7 +593,7 @@ bool SDWADstPreserveOperand::convertToSDWA(MachineInstr &MI,
                  MI.getNumOperands() - 1);
 
   // Convert MI as any other SDWADstOperand and remove v_or_b32
-  return SDWADstOperand::convertToSDWA(MI, TII);
+  return SDWADstOperand::convertToSDWA(MI, TII, CombineSelections);
 }
 
 std::optional<int64_t>
@@ -1021,21 +1079,13 @@ bool isConvertibleToSDWA(MachineInstr &MI,
 }
 } // namespace
 
-bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
-                                   const SDWAOperandsVector &SDWAOperands) {
-
-  LLVM_DEBUG(dbgs() << "Convert instruction:" << MI);
-
-  // Convert to sdwa
-  int SDWAOpcode;
+MachineInstr *SIPeepholeSDWA::createSDWAVersion(MachineInstr &MI) {
   unsigned Opcode = MI.getOpcode();
-  if (TII->isSDWA(Opcode)) {
-    SDWAOpcode = Opcode;
-  } else {
-    SDWAOpcode = AMDGPU::getSDWAOp(Opcode);
-    if (SDWAOpcode == -1)
-      SDWAOpcode = AMDGPU::getSDWAOp(AMDGPU::getVOPe32(Opcode));
-  }
+  assert(!TII->isSDWA(Opcode));
+
+  int SDWAOpcode = AMDGPU::getSDWAOp(Opcode);
+  if (SDWAOpcode == -1)
+    SDWAOpcode = AMDGPU::getSDWAOp(AMDGPU::getVOPe32(Opcode));
   assert(SDWAOpcode != -1);
 
   const MCInstrDesc &SDWADesc = TII->get(SDWAOpcode);
@@ -1169,6 +1219,28 @@ bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
     SDWAInst->tieOperands(PreserveDstIdx, SDWAInst->getNumOperands() - 1);
   }
 
+  return SDWAInst.getInstr();
+}
+
+bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
+                                   const SDWAOperandsVector &SDWAOperands) {
+  LLVM_DEBUG(dbgs() << "Convert instruction:" << MI);
+
+  MachineInstr *SDWAInst;
+  bool CombineSelections;
+  if (TII->isSDWA(MI.getOpcode())) {
+    // No conversion necessary, since MI is an SDWA instruction.  But
+    // tell convertToSDWA below to combine selections of this instruction
+    // and its SDWA operands.
+    SDWAInst = MI.getParent()->getParent()->CloneMachineInstr(&MI);
+    MI.getParent()->insert(MI.getIterator(), SDWAInst);
+    CombineSelections = true;
+  } else {
+    // Convert to sdwa
+    SDWAInst = createSDWAVersion(MI);
+    CombineSelections = false;
+  }
+
   // Apply all sdwa operand patterns.
   bool Converted = false;
   for (auto &Operand : SDWAOperands) {
@@ -1184,22 +1256,21 @@ bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
     // was already destroyed). So if SDWAOperand is also a potential MI then do
     // not apply it.
     if (PotentialMatches.count(Operand->getParentInst()) == 0)
-      Converted |= Operand->convertToSDWA(*SDWAInst, TII);
+      Converted |= Operand->convertToSDWA(*SDWAInst, TII, CombineSelections);
   }
 
-  if (Converted) {
-    ConvertedInstructions.push_back(SDWAInst);
-    for (MachineOperand &MO : SDWAInst->uses()) {
-      if (!MO.isReg())
-        continue;
-
-      MRI->clearKillFlags(MO.getReg());
-    }
-  } else {
+  if (!Converted) {
     SDWAInst->eraseFromParent();
     return false;
   }
 
+  ConvertedInstructions.push_back(SDWAInst);
+  for (MachineOperand &MO : SDWAInst->uses()) {
+    if (!MO.isReg())
+      continue;
+
+    MRI->clearKillFlags(MO.getReg());
+  }
   LLVM_DEBUG(dbgs() << "\nInto:" << *SDWAInst << '\n');
   ++NumSDWAInstructionsPeepholed;
 
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir b/llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir
new file mode 100644
index 00000000000000..43708e9513c68b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir
@@ -0,0 +1,124 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -run-pass=si-peephole-sdwa -o - %s | FileCheck -check-prefix=NOHAZARD %s
+
+---
+name:            sdwa_opsel_hazard
+body:             |
+  ; NOHAZARD-LABEL: name: sdwa_opsel_hazard
+  ; NOHAZARD: bb.0:
+  ; NOHAZARD-NEXT:   successors: %bb.7(0x40000000), %bb.8(0x40000000)
+  ; NOHAZARD-NEXT:   liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT:   [[DEF:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+  ; NOHAZARD-NEXT:   [[DEF1:%[0-9]+]]:sreg_64_xexec_xnull = IMPLICIT_DEF
+  ; NOHAZARD-NEXT:   [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; NOHAZARD-NEXT:   [[GLOBAL_LOAD_DWORD_SADDR:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR killed [[DEF1]], [[DEF2]], 0, 0, implicit $exec
+  ; NOHAZARD-NEXT:   [[SI_IF:%[0-9]+]]:sreg_32 = SI_IF undef [[DEF]], %bb.8, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+  ; NOHAZARD-NEXT:   S_BRANCH %bb.7
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.1:
+  ; NOHAZARD-NEXT:   successors: %bb.2(0x80000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT:   [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 255, implicit $exec
+  ; NOHAZARD-NEXT:   [[V_AND_B32_sdwa:%[0-9]+]]:vgpr_32 = V_AND_B32_sdwa 0, undef [[GLOBAL_LOAD_DWORD_SADDR]], 0, [[V_MOV_B32_e32_]], 0, 6, 0, 5, 6, implicit $exec
+  ; NOHAZARD-NEXT:   [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2, implicit $exec
+  ; NOHAZARD-NEXT:   [[V_LSHLREV_B32_sdwa:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_sdwa 0, [[V_MOV_B32_e32_1]], 0, undef [[GLOBAL_LOAD_DWORD_SADDR]], 0, 6, 0, 6, 2, implicit $exec
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.2:
+  ; NOHAZARD-NEXT:   successors: %bb.3(0x40000000), %bb.4(0x40000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT:   [[SI_IF1:%[0-9]+]]:sreg_32 = SI_IF killed undef %9, %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+  ; NOHAZARD-NEXT:   S_BRANCH %bb.3
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.3:
+  ; NOHAZARD-NEXT:   successors: %bb.4(0x80000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.4:
+  ; NOHAZARD-NEXT:   successors: %bb.5(0x40000000), %bb.6(0x40000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT:   [[SI_IF2:%[0-9]+]]:sreg_32 = SI_IF killed undef [[SI_IF1]], %bb.6, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+  ; NOHAZARD-NEXT:   S_BRANCH %bb.5
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.5:
+  ; NOHAZARD-NEXT:   successors: %bb.6(0x80000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.6:
+  ; NOHAZARD-NEXT:   successors: %bb.9(0x40000000), %bb.10(0x40000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT:   [[SI_IF3:%[0-9]+]]:sreg_32 = SI_IF undef [[DEF]], %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+  ; NOHAZARD-NEXT:   S_BRANCH %bb.9
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.7:
+  ; NOHAZARD-NEXT:   successors: %bb.8(0x80000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.8:
+  ; NOHAZARD-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT:   [[V_LSHRREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 16, undef [[GLOBAL_LOAD_DWORD_SADDR]], implicit $exec
+  ; NOHAZARD-NEXT:   [[SI_IF4:%[0-9]+]]:sreg_32 = SI_IF killed undef [[SI_IF]], %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+  ; NOHAZARD-NEXT:   S_BRANCH %bb.1
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.9:
+  ; NOHAZARD-NEXT:   successors: %bb.10(0x80000000)
+  ; NOHAZARD-NEXT: {{  $}}
+  ; NOHAZARD-NEXT: bb.10:
+  ; NOHAZARD-NEXT:   S_ENDPGM 0
+  bb.0:
+    successors: %bb.7(0x40000000), %bb.8(0x40000000)
+    liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6
+
+    %0:sreg_32 = IMPLICIT_DEF
+    %1:sreg_64_xexec_xnull = IMPLICIT_DEF
+    %2:vgpr_32 = IMPLICIT_DEF
+    %3:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR killed %1, %2, 0, 0, implicit $exec
+    %4:sreg_32 = SI_IF undef %0, %bb.8, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.7
+
+  bb.1:
+    successors: %bb.2(0x80000000)
+
+    %5:vgpr_32 = V_AND_B32_e64 undef %6, 255, implicit $exec
+    %7:vgpr_32 = V_LSHLREV_B32_e64 2, killed undef %5, implicit $exec
+
+  bb.2:
+    successors: %bb.3(0x40000000), %bb.4(0x40000000)
+
+    %8:sreg_32 = SI_IF killed undef %9, %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.3
+
+  bb.3:
+    successors: %bb.4(0x80000000)
+
+  bb.4:
+    successors: %bb.5(0x40000000), %bb.6(0x40000000)
+
+    %10:sreg_32 = SI_IF killed undef %8, %bb.6, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.5
+
+  bb.5:
+    successors: %bb.6(0x80000000)
+
+  bb.6:
+    successors: %bb.9(0x40000000), %bb.10(0x40000000)
+
+    %11:sreg_32 = SI_IF undef %0, %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.9
+
+  bb.7:
+    successors: %bb.8(0x80000000)
+
+  bb.8:
+    successors: %bb.1(0x40000000), %bb.2(0x40000000)
+
+    %6:vgpr_32 = V_LSHRREV_B32_e64 16, undef %3, implicit $exec
+    %9:sreg_32 = SI_IF killed undef %4, %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.1
+
+  bb.9:
+    successors: %bb.10(0x80000000)
+
+  bb.10:
+    S_ENDPGM 0
+
+...
+

frederik-h added a commit to frederik-h/llvm-project that referenced this pull request Jan 22, 2025
This is meant as a short-term workaround for an invalid conversion
in this pass that occurs because existing SDWA selections are not
correctly taken into account during the conversion.

See the draft PR llvm#123221 for an attempt to fix the actual issue.
frederik-h added a commit that referenced this pull request Jan 23, 2025
This is meant as a short-term workaround for an invalid conversion in
this pass that occurs because existing SDWA selections are not correctly
taken into account during the conversion.

See the draft PR #123221 for an attempt to fix the actual issue.

---------

Co-authored-by: Frederik Harwath <[email protected]>
frederik-h added a commit to frederik-h/llvm-project that referenced this pull request Jan 23, 2025
…123942)

This is meant as a short-term workaround for an invalid conversion in
this pass that occurs because existing SDWA selections are not correctly
taken into account during the conversion.

See the draft PR llvm#123221 for an attempt to fix the actual issue.
@frederik-h frederik-h force-pushed the SIPeepholeSDWA-CombineSelections branch from 2c85e88 to b5aa73d Compare January 29, 2025 13:18
frederik-h and others added 14 commits February 26, 2025 12:16
In contrast to SDWASrcOperand::canCombineSelections, in
SDWADstOperand::canCombineSelections we don't need to figure out
which operand we have. The same function could be used for both
but this would introduce a redundant check if the operand
is the vdst operand.
... and adjust test name of existing source selection test.
@frederik-h frederik-h requested a review from arsenm February 28, 2025 16:15
@frederik-h frederik-h merged commit ba9bd22 into llvm:main Mar 3, 2025
11 checks passed
@frederik-h frederik-h deleted the SIPeepholeSDWA-CombineSelections branch March 3, 2025 16:07
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
The si-peephole-sdwa pass adjusts the selections on sdwa instructions to
the selections on their operands during its conversions. For instance,
if an instruction selects `BYTE_0` and its operand selects `WORD_1`, the
combined selection should be `BYTE_2`, i.e. "`BYTE_0` of `WORD_1`". The
existing implementation does not always handle this correctly in some
complex situations with instructions across different basic blocks as
demonstrated by the test cases included in this PR.

This PR adds an additional selection combination step to the conversion
to fix this issue. It reverts the changes made by PR llvm#123942 which had
disabled the conversion of preexisting SDWA instructions completely as a
quick fix.

---------

Co-authored-by: Jeffrey Byrnes <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants