-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Account for existing SDWA selections #123221
Merged
frederik-h
merged 48 commits into
llvm:main
from
frederik-h:SIPeepholeSDWA-CombineSelections
Mar 3, 2025
Merged
[AMDGPU] Account for existing SDWA selections #123221
frederik-h
merged 48 commits into
llvm:main
from
frederik-h:SIPeepholeSDWA-CombineSelections
Mar 3, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Change-Id: I3e1cf6042f069e8dffe9dd5b4654288111f7b1bf
- Remove redundant "if". - Replace arithmetic on SdwaSel type The case distinction seems clearer and removes a mishandled case: Since (SdwaSel)((unsigned)WORD_0 + 2) == DWORD, the existing code led to the transformation: WORD_0 Sel (WORD_1 Sel (%X)) -> DWORD Sel (%X) The correct transformation should be: WORD_0 Sel (WORD_1 Sel (%X)) -> WORD_1 Sel (%X)
There are two loops that invoke the conversion on the operands of the input instruction, one for the case where the instruction is already an SDWA instruction and one for the case where it isn't. The loops are almost the same. Fuse those loops into a single loop.
✅ With the latest revision this PR passed the C/C++ code formatter. |
@llvm/pr-subscribers-backend-amdgpu Author: Frederik Harwath (frederik-h) ChangesFull diff: https://github.com/llvm/llvm-project/pull/123221.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
index 467f042892cebe..713ef162f8dee5 100644
--- a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
@@ -62,6 +62,7 @@ class SIPeepholeSDWA {
std::unique_ptr<SDWAOperand> matchSDWAOperand(MachineInstr &MI);
void pseudoOpConvertToVOP2(MachineInstr &MI,
const GCNSubtarget &ST) const;
+ MachineInstr *createSDWAVersion(MachineInstr &MI);
bool convertToSDWA(MachineInstr &MI, const SDWAOperandsVector &SDWAOperands);
void legalizeScalarOperands(MachineInstr &MI, const GCNSubtarget &ST) const;
@@ -85,6 +86,8 @@ class SIPeepholeSDWALegacy : public MachineFunctionPass {
}
};
+using namespace AMDGPU::SDWA;
+
class SDWAOperand {
private:
MachineOperand *Target; // Operand that would be used in converted instruction
@@ -102,12 +105,47 @@ class SDWAOperand {
virtual MachineInstr *potentialToConvert(const SIInstrInfo *TII,
const GCNSubtarget &ST,
SDWAOperandsMap *PotentialMatches = nullptr) = 0;
- virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) = 0;
+ virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections = false) = 0;
MachineOperand *getTargetOperand() const { return Target; }
MachineOperand *getReplacedOperand() const { return Replaced; }
MachineInstr *getParentInst() const { return Target->getParent(); }
+ /// Fold a \p FoldedOp SDWA selection into an \p ExistingOp existing SDWA
+ /// selection. If the selections are compatible, return the combined
+ /// selection, otherwise return a nullopt. For example, if we have existing
+ /// BYTE_0 Sel and are attempting to fold WORD_1 Sel:
+ /// BYTE_0 Sel (WORD_1 Sel (%X)) -> BYTE_2 Sel (%X)
+ std::optional<SdwaSel> combineSdwaSel(SdwaSel ExistingOp, SdwaSel FoldedOp) {
+ if (ExistingOp == SdwaSel::DWORD)
+ return FoldedOp;
+
+ if (FoldedOp == SdwaSel::DWORD)
+ return ExistingOp;
+
+ if (ExistingOp == SdwaSel::WORD_1 || ExistingOp == SdwaSel::BYTE_2 ||
+ ExistingOp == SdwaSel::BYTE_3)
+ return {};
+
+ if (ExistingOp == FoldedOp)
+ return ExistingOp;
+
+ if (FoldedOp == SdwaSel::WORD_0)
+ return ExistingOp;
+
+ if (FoldedOp == SdwaSel::WORD_1) {
+ if (ExistingOp == SdwaSel::BYTE_0)
+ return SdwaSel::BYTE_2;
+ if (ExistingOp == SdwaSel::BYTE_1)
+ return SdwaSel::BYTE_3;
+ if (ExistingOp == SdwaSel::WORD_0)
+ return SdwaSel::WORD_1;
+ }
+
+ return {};
+ }
+
MachineRegisterInfo *getMRI() const {
return &getParentInst()->getParent()->getParent()->getRegInfo();
}
@@ -118,8 +156,6 @@ class SDWAOperand {
#endif
};
-using namespace AMDGPU::SDWA;
-
class SDWASrcOperand : public SDWAOperand {
private:
SdwaSel SrcSel;
@@ -137,7 +173,8 @@ class SDWASrcOperand : public SDWAOperand {
MachineInstr *potentialToConvert(const SIInstrInfo *TII,
const GCNSubtarget &ST,
SDWAOperandsMap *PotentialMatches = nullptr) override;
- bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) override;
+ bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections = false) override;
SdwaSel getSrcSel() const { return SrcSel; }
bool getAbs() const { return Abs; }
@@ -166,7 +203,8 @@ class SDWADstOperand : public SDWAOperand {
MachineInstr *potentialToConvert(const SIInstrInfo *TII,
const GCNSubtarget &ST,
SDWAOperandsMap *PotentialMatches = nullptr) override;
- bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) override;
+ bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections = false) override;
SdwaSel getDstSel() const { return DstSel; }
DstUnused getDstUnused() const { return DstUn; }
@@ -186,7 +224,8 @@ class SDWADstPreserveOperand : public SDWADstOperand {
: SDWADstOperand(TargetOp, ReplacedOp, DstSel_, UNUSED_PRESERVE),
Preserve(PreserveOp) {}
- bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) override;
+ bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections = false) override;
MachineOperand *getPreservedOperand() const { return Preserve; }
@@ -375,7 +414,8 @@ MachineInstr *SDWASrcOperand::potentialToConvert(const SIInstrInfo *TII,
return PotentialMO->getParent();
}
-bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
+bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections) {
switch (MI.getOpcode()) {
case AMDGPU::V_CVT_F32_FP8_sdwa:
case AMDGPU::V_CVT_F32_BF8_sdwa:
@@ -451,7 +491,15 @@ bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
}
copyRegOperand(*Src, *getTargetOperand());
if (!IsPreserveSrc) {
- SrcSel->setImm(getSrcSel());
+ if (CombineSelections) {
+ std::optional<SdwaSel> NewOp =
+ combineSdwaSel((SdwaSel)SrcSel->getImm(), getSrcSel());
+ if (!NewOp.has_value())
+ return false;
+ SrcSel->setImm(NewOp.value());
+ } else {
+ SrcSel->setImm(getSrcSel());
+ }
SrcMods->setImm(getSrcMods(TII, Src));
}
getTargetOperand()->setIsKill(false);
@@ -479,7 +527,8 @@ MachineInstr *SDWADstOperand::potentialToConvert(const SIInstrInfo *TII,
return PotentialMO->getParent();
}
-bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
+bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections) {
// Replace vdst operand in MI with target operand. Set dst_sel and dst_unused
if ((MI.getOpcode() == AMDGPU::V_FMAC_F16_sdwa ||
@@ -498,7 +547,15 @@ bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
copyRegOperand(*Operand, *getTargetOperand());
MachineOperand *DstSel= TII->getNamedOperand(MI, AMDGPU::OpName::dst_sel);
assert(DstSel);
- DstSel->setImm(getDstSel());
+ if (CombineSelections) {
+ std::optional<SdwaSel> NewOp =
+ combineSdwaSel((SdwaSel)DstSel->getImm(), getDstSel());
+ if (!NewOp.has_value())
+ return false;
+ DstSel->setImm(NewOp.value());
+ } else {
+ DstSel->setImm(getDstSel());
+ }
MachineOperand *DstUnused= TII->getNamedOperand(MI, AMDGPU::OpName::dst_unused);
assert(DstUnused);
DstUnused->setImm(getDstUnused());
@@ -510,7 +567,8 @@ bool SDWADstOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
}
bool SDWADstPreserveOperand::convertToSDWA(MachineInstr &MI,
- const SIInstrInfo *TII) {
+ const SIInstrInfo *TII,
+ bool CombineSelections) {
// MI should be moved right before v_or_b32.
// For this we should clear all kill flags on uses of MI src-operands or else
// we can encounter problem with use of killed operand.
@@ -535,7 +593,7 @@ bool SDWADstPreserveOperand::convertToSDWA(MachineInstr &MI,
MI.getNumOperands() - 1);
// Convert MI as any other SDWADstOperand and remove v_or_b32
- return SDWADstOperand::convertToSDWA(MI, TII);
+ return SDWADstOperand::convertToSDWA(MI, TII, CombineSelections);
}
std::optional<int64_t>
@@ -1021,21 +1079,13 @@ bool isConvertibleToSDWA(MachineInstr &MI,
}
} // namespace
-bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
- const SDWAOperandsVector &SDWAOperands) {
-
- LLVM_DEBUG(dbgs() << "Convert instruction:" << MI);
-
- // Convert to sdwa
- int SDWAOpcode;
+MachineInstr *SIPeepholeSDWA::createSDWAVersion(MachineInstr &MI) {
unsigned Opcode = MI.getOpcode();
- if (TII->isSDWA(Opcode)) {
- SDWAOpcode = Opcode;
- } else {
- SDWAOpcode = AMDGPU::getSDWAOp(Opcode);
- if (SDWAOpcode == -1)
- SDWAOpcode = AMDGPU::getSDWAOp(AMDGPU::getVOPe32(Opcode));
- }
+ assert(!TII->isSDWA(Opcode));
+
+ int SDWAOpcode = AMDGPU::getSDWAOp(Opcode);
+ if (SDWAOpcode == -1)
+ SDWAOpcode = AMDGPU::getSDWAOp(AMDGPU::getVOPe32(Opcode));
assert(SDWAOpcode != -1);
const MCInstrDesc &SDWADesc = TII->get(SDWAOpcode);
@@ -1169,6 +1219,28 @@ bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
SDWAInst->tieOperands(PreserveDstIdx, SDWAInst->getNumOperands() - 1);
}
+ return SDWAInst.getInstr();
+}
+
+bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
+ const SDWAOperandsVector &SDWAOperands) {
+ LLVM_DEBUG(dbgs() << "Convert instruction:" << MI);
+
+ MachineInstr *SDWAInst;
+ bool CombineSelections;
+ if (TII->isSDWA(MI.getOpcode())) {
+ // No conversion necessary, since MI is an SDWA instruction. But
+ // tell convertToSDWA below to combine selections of this instruction
+ // and its SDWA operands.
+ SDWAInst = MI.getParent()->getParent()->CloneMachineInstr(&MI);
+ MI.getParent()->insert(MI.getIterator(), SDWAInst);
+ CombineSelections = true;
+ } else {
+ // Convert to sdwa
+ SDWAInst = createSDWAVersion(MI);
+ CombineSelections = false;
+ }
+
// Apply all sdwa operand patterns.
bool Converted = false;
for (auto &Operand : SDWAOperands) {
@@ -1184,22 +1256,21 @@ bool SIPeepholeSDWA::convertToSDWA(MachineInstr &MI,
// was already destroyed). So if SDWAOperand is also a potential MI then do
// not apply it.
if (PotentialMatches.count(Operand->getParentInst()) == 0)
- Converted |= Operand->convertToSDWA(*SDWAInst, TII);
+ Converted |= Operand->convertToSDWA(*SDWAInst, TII, CombineSelections);
}
- if (Converted) {
- ConvertedInstructions.push_back(SDWAInst);
- for (MachineOperand &MO : SDWAInst->uses()) {
- if (!MO.isReg())
- continue;
-
- MRI->clearKillFlags(MO.getReg());
- }
- } else {
+ if (!Converted) {
SDWAInst->eraseFromParent();
return false;
}
+ ConvertedInstructions.push_back(SDWAInst);
+ for (MachineOperand &MO : SDWAInst->uses()) {
+ if (!MO.isReg())
+ continue;
+
+ MRI->clearKillFlags(MO.getReg());
+ }
LLVM_DEBUG(dbgs() << "\nInto:" << *SDWAInst << '\n');
++NumSDWAInstructionsPeepholed;
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir b/llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir
new file mode 100644
index 00000000000000..43708e9513c68b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir
@@ -0,0 +1,124 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -run-pass=si-peephole-sdwa -o - %s | FileCheck -check-prefix=NOHAZARD %s
+
+---
+name: sdwa_opsel_hazard
+body: |
+ ; NOHAZARD-LABEL: name: sdwa_opsel_hazard
+ ; NOHAZARD: bb.0:
+ ; NOHAZARD-NEXT: successors: %bb.7(0x40000000), %bb.8(0x40000000)
+ ; NOHAZARD-NEXT: liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: [[DEF:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+ ; NOHAZARD-NEXT: [[DEF1:%[0-9]+]]:sreg_64_xexec_xnull = IMPLICIT_DEF
+ ; NOHAZARD-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; NOHAZARD-NEXT: [[GLOBAL_LOAD_DWORD_SADDR:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR killed [[DEF1]], [[DEF2]], 0, 0, implicit $exec
+ ; NOHAZARD-NEXT: [[SI_IF:%[0-9]+]]:sreg_32 = SI_IF undef [[DEF]], %bb.8, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; NOHAZARD-NEXT: S_BRANCH %bb.7
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.1:
+ ; NOHAZARD-NEXT: successors: %bb.2(0x80000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 255, implicit $exec
+ ; NOHAZARD-NEXT: [[V_AND_B32_sdwa:%[0-9]+]]:vgpr_32 = V_AND_B32_sdwa 0, undef [[GLOBAL_LOAD_DWORD_SADDR]], 0, [[V_MOV_B32_e32_]], 0, 6, 0, 5, 6, implicit $exec
+ ; NOHAZARD-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2, implicit $exec
+ ; NOHAZARD-NEXT: [[V_LSHLREV_B32_sdwa:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_sdwa 0, [[V_MOV_B32_e32_1]], 0, undef [[GLOBAL_LOAD_DWORD_SADDR]], 0, 6, 0, 6, 2, implicit $exec
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.2:
+ ; NOHAZARD-NEXT: successors: %bb.3(0x40000000), %bb.4(0x40000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: [[SI_IF1:%[0-9]+]]:sreg_32 = SI_IF killed undef %9, %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; NOHAZARD-NEXT: S_BRANCH %bb.3
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.3:
+ ; NOHAZARD-NEXT: successors: %bb.4(0x80000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.4:
+ ; NOHAZARD-NEXT: successors: %bb.5(0x40000000), %bb.6(0x40000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: [[SI_IF2:%[0-9]+]]:sreg_32 = SI_IF killed undef [[SI_IF1]], %bb.6, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; NOHAZARD-NEXT: S_BRANCH %bb.5
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.5:
+ ; NOHAZARD-NEXT: successors: %bb.6(0x80000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.6:
+ ; NOHAZARD-NEXT: successors: %bb.9(0x40000000), %bb.10(0x40000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: [[SI_IF3:%[0-9]+]]:sreg_32 = SI_IF undef [[DEF]], %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; NOHAZARD-NEXT: S_BRANCH %bb.9
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.7:
+ ; NOHAZARD-NEXT: successors: %bb.8(0x80000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.8:
+ ; NOHAZARD-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: [[V_LSHRREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 16, undef [[GLOBAL_LOAD_DWORD_SADDR]], implicit $exec
+ ; NOHAZARD-NEXT: [[SI_IF4:%[0-9]+]]:sreg_32 = SI_IF killed undef [[SI_IF]], %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ ; NOHAZARD-NEXT: S_BRANCH %bb.1
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.9:
+ ; NOHAZARD-NEXT: successors: %bb.10(0x80000000)
+ ; NOHAZARD-NEXT: {{ $}}
+ ; NOHAZARD-NEXT: bb.10:
+ ; NOHAZARD-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.7(0x40000000), %bb.8(0x40000000)
+ liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6
+
+ %0:sreg_32 = IMPLICIT_DEF
+ %1:sreg_64_xexec_xnull = IMPLICIT_DEF
+ %2:vgpr_32 = IMPLICIT_DEF
+ %3:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR killed %1, %2, 0, 0, implicit $exec
+ %4:sreg_32 = SI_IF undef %0, %bb.8, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ S_BRANCH %bb.7
+
+ bb.1:
+ successors: %bb.2(0x80000000)
+
+ %5:vgpr_32 = V_AND_B32_e64 undef %6, 255, implicit $exec
+ %7:vgpr_32 = V_LSHLREV_B32_e64 2, killed undef %5, implicit $exec
+
+ bb.2:
+ successors: %bb.3(0x40000000), %bb.4(0x40000000)
+
+ %8:sreg_32 = SI_IF killed undef %9, %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ S_BRANCH %bb.3
+
+ bb.3:
+ successors: %bb.4(0x80000000)
+
+ bb.4:
+ successors: %bb.5(0x40000000), %bb.6(0x40000000)
+
+ %10:sreg_32 = SI_IF killed undef %8, %bb.6, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ S_BRANCH %bb.5
+
+ bb.5:
+ successors: %bb.6(0x80000000)
+
+ bb.6:
+ successors: %bb.9(0x40000000), %bb.10(0x40000000)
+
+ %11:sreg_32 = SI_IF undef %0, %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ S_BRANCH %bb.9
+
+ bb.7:
+ successors: %bb.8(0x80000000)
+
+ bb.8:
+ successors: %bb.1(0x40000000), %bb.2(0x40000000)
+
+ %6:vgpr_32 = V_LSHRREV_B32_e64 16, undef %3, implicit $exec
+ %9:sreg_32 = SI_IF killed undef %4, %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+ S_BRANCH %bb.1
+
+ bb.9:
+ successors: %bb.10(0x80000000)
+
+ bb.10:
+ S_ENDPGM 0
+
+...
+
|
arsenm
reviewed
Jan 17, 2025
arsenm
reviewed
Jan 17, 2025
frederik-h
added a commit
to frederik-h/llvm-project
that referenced
this pull request
Jan 22, 2025
This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR llvm#123221 for an attempt to fix the actual issue.
frederik-h
added a commit
that referenced
this pull request
Jan 23, 2025
This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR #123221 for an attempt to fix the actual issue. --------- Co-authored-by: Frederik Harwath <[email protected]>
frederik-h
added a commit
to frederik-h/llvm-project
that referenced
this pull request
Jan 23, 2025
…123942) This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR llvm#123221 for an attempt to fix the actual issue.
The flag is not necessary since the relevant instructions can be detected by looking at the SrcSel->getImm().
2c85e88
to
b5aa73d
Compare
As per LLVM coding standards.
jrbyrnes
reviewed
Feb 21, 2025
arsenm
reviewed
Feb 26, 2025
Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
In contrast to SDWASrcOperand::canCombineSelections, in SDWADstOperand::canCombineSelections we don't need to figure out which operand we have. The same function could be used for both but this would introduce a redundant check if the operand is the vdst operand.
... and adjust test name of existing source selection test.
arsenm
approved these changes
Mar 3, 2025
arsenm
approved these changes
Mar 3, 2025
jph-13
pushed a commit
to jph-13/llvm-project
that referenced
this pull request
Mar 21, 2025
The si-peephole-sdwa pass adjusts the selections on sdwa instructions to the selections on their operands during its conversions. For instance, if an instruction selects `BYTE_0` and its operand selects `WORD_1`, the combined selection should be `BYTE_2`, i.e. "`BYTE_0` of `WORD_1`". The existing implementation does not always handle this correctly in some complex situations with instructions across different basic blocks as demonstrated by the test cases included in this PR. This PR adds an additional selection combination step to the conversion to fix this issue. It reverts the changes made by PR llvm#123942 which had disabled the conversion of preexisting SDWA instructions completely as a quick fix. --------- Co-authored-by: Jeffrey Byrnes <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The si-peephole-sdwa pass adjusts the selections on sdwa instructions to the selections on their operands during its conversions. For instance, if an instruction selects
BYTE_0
and its operand selectsWORD_1
, the combined selection should beBYTE_2
, i.e. "BYTE_0
ofWORD_1
". The existing implementation does not always handle this correctly in some complex situations with instructions across different basic blocks as demonstrated by the test cases included in this PR.This PR adds an additional selection combination step to the conversion to fix this issue. It reverts the changes made by PR #123942 which had disabled the conversion of preexisting SDWA instructions completely as a quick fix.