Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions #123942

Merged

Conversation

frederik-h
Copy link
Contributor

This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion.

See the draft PR #123221 for an attempt to fix the actual issue.

This is meant as a short-term workaround for an invalid conversion
in this pass that occurs because existing SDWA selections are not
correctly taken into account during the conversion.

See the draft PR llvm#123221 for an attempt to fix the actual issue.
@frederik-h frederik-h force-pushed the SIPeepholeSDWA-disable-on-existing-sdwa branch from 0e62aa5 to 7839be8 Compare January 23, 2025 10:12
@frederik-h frederik-h marked this pull request as ready for review January 23, 2025 10:13
@frederik-h frederik-h requested a review from jrbyrnes January 23, 2025 10:13
@frederik-h frederik-h requested a review from arsenm January 23, 2025 10:13
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion.

See the draft PR #123221 for an attempt to fix the actual issue.


Patch is 76.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123942.diff

19 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll (+10-5)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll (+10-5)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll (+16-10)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll (+16-10)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+21-7)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fadd.ll (+36-12)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll (+24-8)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (+42-14)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll (+24-8)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4u.ll (+12-10)
  • (modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fadd.ll (+12-4)
  • (modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fsub.ll (+12-4)
  • (modified) llvm/test/CodeGen/AMDGPU/permute_i8.ll (+2-1)
  • (added) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.ll (+85)
  • (added) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir (+56)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-gfx10.mir (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr.mir (+4-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-preserve.mir (+9-6)
diff --git a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
index 467f042892cebe..99d37b3b9f6036 100644
--- a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
@@ -963,7 +963,7 @@ bool isConvertibleToSDWA(MachineInstr &MI,
   // Check if this is already an SDWA instruction
   unsigned Opc = MI.getOpcode();
   if (TII->isSDWA(Opc))
-    return true;
+    return false;
 
   // Check if this instruction has opcode that supports SDWA
   if (AMDGPU::getSDWAOp(Opc) == -1)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
index e289ee759da158..2d9e8969fdbb52 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
@@ -280,8 +280,9 @@ define i16 @v_saddsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX8-NEXT:    v_min_i16_e32 v1, v2, v1
 ; GFX8-NEXT:    v_add_u16_e32 v1, v3, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v2, 0xff
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -299,7 +300,8 @@ define i16 @v_saddsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_i16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -439,7 +441,8 @@ define amdgpu_ps i16 @s_saddsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_i16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -609,9 +612,11 @@ define i32 @v_saddsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
index 43ebe156eb2a28..a98b305c15f75c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
@@ -281,8 +281,9 @@ define i16 @v_ssubsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX8-NEXT:    v_min_i16_e32 v1, v1, v4
 ; GFX8-NEXT:    v_sub_u16_e32 v1, v3, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v2, 0xff
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -300,7 +301,8 @@ define i16 @v_ssubsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_i16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -440,7 +442,8 @@ define amdgpu_ps i16 @s_ssubsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_i16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -610,9 +613,11 @@ define i32 @v_ssubsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
index 788692c94b0cfa..3d7fec9a5986cd 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
@@ -224,7 +224,8 @@ define i16 @v_uaddsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_u16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -329,7 +330,8 @@ define amdgpu_ps i16 @s_uaddsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_u16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -451,9 +453,11 @@ define i32 @v_uaddsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -618,18 +622,20 @@ define amdgpu_ps i32 @s_uaddsat_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg) {
 ; GFX8-NEXT:    v_mov_b32_e32 v4, 0xff
 ; GFX8-NEXT:    s_lshl_b32 s0, s3, 8
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s1
-; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_add_u16_e64 v2, s0, v2 clamp
-; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
-; GFX8-NEXT:    v_mov_b32_e32 v3, s1
+; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
-; GFX8-NEXT:    v_add_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
+; GFX8-NEXT:    v_mov_b32_e32 v3, s1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_add_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX8-NEXT:    ; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
index 0042d34e235d17..0ab16d95b191d9 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
@@ -218,7 +218,8 @@ define i16 @v_usubsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_u16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -321,7 +322,8 @@ define amdgpu_ps i16 @s_usubsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_u16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -439,9 +441,11 @@ define i32 @v_usubsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -602,18 +606,20 @@ define amdgpu_ps i32 @s_usubsat_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg) {
 ; GFX8-NEXT:    v_mov_b32_e32 v4, 0xff
 ; GFX8-NEXT:    s_lshl_b32 s0, s3, 8
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s1
-; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_sub_u16_e64 v2, s0, v2 clamp
-; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
-; GFX8-NEXT:    v_mov_b32_e32 v3, s1
+; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
-; GFX8-NEXT:    v_sub_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
+; GFX8-NEXT:    v_mov_b32_e32 v3, s1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_sub_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX8-NEXT:    ; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll b/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
index e8f1619c5d418c..a969e3d4f4f79b 100644
--- a/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
@@ -6398,8 +6398,10 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset__amdgpu_no
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v0
-; GFX8-NEXT:    v_add_f16_sdwa v0, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 16, v5
+; GFX8-NEXT:    v_add_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v1, v5, v2
+; GFX8-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v4, v1, v0
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
@@ -6625,8 +6627,10 @@ define void @buffer_fat_ptr_agent_atomic_fadd_noret_v2f16__offset__amdgpu_no_fin
 ; GFX8-NEXT:  .LBB20_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v1, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX8-NEXT:    v_add_f16_sdwa v1, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v4, v2, v0
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v1, v4, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v2
 ; GFX8-NEXT:    v_mov_b32_e32 v4, v1
@@ -7044,7 +7048,9 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset__waterfall
 ; GFX8-NEXT:    ; =>This Loop Header: Depth=1
 ; GFX8-NEXT:    ; Child Loop BB21_4 Depth 2
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v4, v8, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v4, 16, v8
+; GFX8-NEXT:    v_add_f16_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
+; GFX8-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX8-NEXT:    v_add_f16_e32 v6, v8, v5
 ; GFX8-NEXT:    v_or_b32_e32 v7, v6, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v6, v7
@@ -7390,8 +7396,10 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset(ptr addrsp
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v0
-; GFX8-NEXT:    v_add_f16_sdwa v0, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 16, v5
+; GFX8-NEXT:    v_add_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v1, v5, v2
+; GFX8-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v4, v1, v0
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
@@ -7650,8 +7658,10 @@ define void @buffer_fat_ptr_agent_atomic_fadd_noret_v2f16__offset(ptr addrspace(
 ; GFX8-NEXT:  .LBB23_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v1, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX8-NEXT:    v_add_f16_sdwa v1, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v4, v2, v0
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v1, v4, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v2
 ; GFX8-NEXT:    v_mov_b32_e32 v4, v1
@@ -7915,8 +7925,10 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset__amdgpu_no
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v0
-; GFX8-NEXT:    v_add_f16_sdwa v0, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 16, v5
+; GFX8-NEXT:    v_add_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v1, v5, v2
+; GFX8-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v4, v1, v0
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
@@ -8175,8 +8187,10 @@ define void @buffer_fat_ptr_agent_atomic_fadd_noret_v2f16__offset__amdgpu_no_rem
 ; GFX8-NEXT:  .LBB25_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v1, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX8-NEXT:    v_add_f16_sdwa v1, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v4, v2, v0
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e3...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Frederik Harwath (frederik-h)

Changes

This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion.

See the draft PR #123221 for an attempt to fix the actual issue.


Patch is 76.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123942.diff

19 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll (+10-5)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll (+10-5)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll (+16-10)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll (+16-10)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+21-7)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fadd.ll (+36-12)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll (+24-8)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (+42-14)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll (+24-8)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4u.ll (+12-10)
  • (modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fadd.ll (+12-4)
  • (modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fsub.ll (+12-4)
  • (modified) llvm/test/CodeGen/AMDGPU/permute_i8.ll (+2-1)
  • (added) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.ll (+85)
  • (added) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-combine-sel.mir (+56)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr-gfx10.mir (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole-instr.mir (+4-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-preserve.mir (+9-6)
diff --git a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
index 467f042892cebe..99d37b3b9f6036 100644
--- a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
@@ -963,7 +963,7 @@ bool isConvertibleToSDWA(MachineInstr &MI,
   // Check if this is already an SDWA instruction
   unsigned Opc = MI.getOpcode();
   if (TII->isSDWA(Opc))
-    return true;
+    return false;
 
   // Check if this instruction has opcode that supports SDWA
   if (AMDGPU::getSDWAOp(Opc) == -1)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
index e289ee759da158..2d9e8969fdbb52 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
@@ -280,8 +280,9 @@ define i16 @v_saddsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX8-NEXT:    v_min_i16_e32 v1, v2, v1
 ; GFX8-NEXT:    v_add_u16_e32 v1, v3, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v2, 0xff
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -299,7 +300,8 @@ define i16 @v_saddsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_i16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -439,7 +441,8 @@ define amdgpu_ps i16 @s_saddsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_i16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -609,9 +612,11 @@ define i32 @v_saddsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
index 43ebe156eb2a28..a98b305c15f75c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
@@ -281,8 +281,9 @@ define i16 @v_ssubsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX8-NEXT:    v_min_i16_e32 v1, v1, v4
 ; GFX8-NEXT:    v_sub_u16_e32 v1, v3, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v2, 0xff
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v1), v2 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -300,7 +301,8 @@ define i16 @v_ssubsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_i16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -440,7 +442,8 @@ define amdgpu_ps i16 @s_ssubsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_i16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_ashrrev_i16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -610,9 +613,11 @@ define i32 @v_ssubsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, sext(v0), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v2), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, sext(v3), v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
index 788692c94b0cfa..3d7fec9a5986cd 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
@@ -224,7 +224,8 @@ define i16 @v_uaddsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_u16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -329,7 +330,8 @@ define amdgpu_ps i16 @s_uaddsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_add_u16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -451,9 +453,11 @@ define i32 @v_uaddsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -618,18 +622,20 @@ define amdgpu_ps i32 @s_uaddsat_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg) {
 ; GFX8-NEXT:    v_mov_b32_e32 v4, 0xff
 ; GFX8-NEXT:    s_lshl_b32 s0, s3, 8
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s1
-; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_add_u16_e64 v2, s0, v2 clamp
-; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
-; GFX8-NEXT:    v_mov_b32_e32 v3, s1
+; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
-; GFX8-NEXT:    v_add_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
+; GFX8-NEXT:    v_mov_b32_e32 v3, s1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_add_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX8-NEXT:    ; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
index 0042d34e235d17..0ab16d95b191d9 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
@@ -218,7 +218,8 @@ define i16 @v_usubsat_v2i8(i16 %lhs.arg, i16 %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_u16 v0, v0, v1 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -321,7 +322,8 @@ define amdgpu_ps i16 @s_usubsat_v2i8(i16 inreg %lhs.arg, i16 inreg %rhs.arg) {
 ; GFX9-NEXT:    v_pk_sub_u16 v0, s0, v0 clamp
 ; GFX9-NEXT:    v_pk_lshrrev_b16 v0, 8, v0 op_sel_hi:[0,1]
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0xff
-; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
 ; GFX9-NEXT:    v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
@@ -439,9 +441,11 @@ define i32 @v_usubsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -602,18 +606,20 @@ define amdgpu_ps i32 @s_usubsat_v4i8(i32 inreg %lhs.arg, i32 inreg %rhs.arg) {
 ; GFX8-NEXT:    v_mov_b32_e32 v4, 0xff
 ; GFX8-NEXT:    s_lshl_b32 s0, s3, 8
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s1
-; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_sub_u16_e64 v2, s0, v2 clamp
-; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
-; GFX8-NEXT:    v_mov_b32_e32 v3, s1
+; GFX8-NEXT:    s_lshl_b32 s1, s7, 8
 ; GFX8-NEXT:    v_and_b32_sdwa v0, v0, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 8, v1
-; GFX8-NEXT:    v_sub_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    s_lshl_b32 s0, s4, 8
+; GFX8-NEXT:    v_mov_b32_e32 v3, s1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_sub_u16_e64 v3, s0, v3 clamp
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:BYTE_3 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_and_b32_sdwa v1, v3, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX8-NEXT:    ; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll b/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
index e8f1619c5d418c..a969e3d4f4f79b 100644
--- a/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll
@@ -6398,8 +6398,10 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset__amdgpu_no
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v0
-; GFX8-NEXT:    v_add_f16_sdwa v0, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 16, v5
+; GFX8-NEXT:    v_add_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v1, v5, v2
+; GFX8-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v4, v1, v0
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
@@ -6625,8 +6627,10 @@ define void @buffer_fat_ptr_agent_atomic_fadd_noret_v2f16__offset__amdgpu_no_fin
 ; GFX8-NEXT:  .LBB20_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v1, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX8-NEXT:    v_add_f16_sdwa v1, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v4, v2, v0
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v1, v4, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v2
 ; GFX8-NEXT:    v_mov_b32_e32 v4, v1
@@ -7044,7 +7048,9 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset__waterfall
 ; GFX8-NEXT:    ; =>This Loop Header: Depth=1
 ; GFX8-NEXT:    ; Child Loop BB21_4 Depth 2
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v4, v8, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v4, 16, v8
+; GFX8-NEXT:    v_add_f16_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
+; GFX8-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX8-NEXT:    v_add_f16_e32 v6, v8, v5
 ; GFX8-NEXT:    v_or_b32_e32 v7, v6, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v6, v7
@@ -7390,8 +7396,10 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset(ptr addrsp
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v0
-; GFX8-NEXT:    v_add_f16_sdwa v0, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 16, v5
+; GFX8-NEXT:    v_add_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v1, v5, v2
+; GFX8-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v4, v1, v0
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
@@ -7650,8 +7658,10 @@ define void @buffer_fat_ptr_agent_atomic_fadd_noret_v2f16__offset(ptr addrspace(
 ; GFX8-NEXT:  .LBB23_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v1, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX8-NEXT:    v_add_f16_sdwa v1, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v4, v2, v0
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e32 v1, v4, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v2
 ; GFX8-NEXT:    v_mov_b32_e32 v4, v1
@@ -7915,8 +7925,10 @@ define <2 x half> @buffer_fat_ptr_agent_atomic_fadd_ret_v2f16__offset__amdgpu_no
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v5, v0
-; GFX8-NEXT:    v_add_f16_sdwa v0, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 16, v5
+; GFX8-NEXT:    v_add_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v1, v5, v2
+; GFX8-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v4, v1, v0
 ; GFX8-NEXT:    v_mov_b32_e32 v0, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v5
@@ -8175,8 +8187,10 @@ define void @buffer_fat_ptr_agent_atomic_fadd_noret_v2f16__offset__amdgpu_no_rem
 ; GFX8-NEXT:  .LBB25_1: ; %atomicrmw.start
 ; GFX8-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_f16_sdwa v1, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX8-NEXT:    v_add_f16_sdwa v1, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_f16_e32 v4, v2, v0
+; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_e3...
[truncated]

@frederik-h frederik-h merged commit 6fdaaaf into llvm:main Jan 23, 2025
8 checks passed
@frederik-h frederik-h deleted the SIPeepholeSDWA-disable-on-existing-sdwa branch January 23, 2025 13:32
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/12126

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-VI /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-VI /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 3: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
RUN: at line 4: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
RUN: at line 5: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 7: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-VI /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-VI /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
RUN: at line 8: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/11982

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-VI /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-VI /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 3: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 4: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
RUN: at line 5: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 7: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-VI /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-VI /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
RUN: at line 8: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/11978

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-VI /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-VI /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 3: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 4: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
RUN: at line 5: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
RUN: at line 7: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-VI /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-VI /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
RUN: at line 8: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

nico added a commit that referenced this pull request Jan 23, 2025
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 23, 2025
frederik-h added a commit to frederik-h/llvm-project that referenced this pull request Jan 23, 2025
…123942)

This is meant as a short-term workaround for an invalid conversion in
this pass that occurs because existing SDWA selections are not correctly
taken into account during the conversion.

See the draft PR llvm#123221 for an attempt to fix the actual issue.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/10119

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=SDAG-VI /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=SDAG-VI /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
RUN: at line 3: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 4: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 5: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
RUN: at line 7: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GISEL-VI /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GISEL-VI /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 8: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/17778

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=SDAG-VI /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=SDAG-VI /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 3: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
RUN: at line 4: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
RUN: at line 5: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
RUN: at line 7: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GISEL-VI /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GISEL-VI /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
RUN: at line 8: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/16974

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=SDAG-VI /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=SDAG-VI /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
RUN: at line 3: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
RUN: at line 4: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 5: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
RUN: at line 7: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GISEL-VI /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GISEL-VI /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
RUN: at line 8: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/12537

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=SDAG-VI /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=SDAG-VI /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
RUN: at line 3: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
RUN: at line 4: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
RUN: at line 5: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
RUN: at line 7: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GISEL-VI /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GISEL-VI /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 8: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 23, 2025

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/20665

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/v_sat_pk_u8_i16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=SDAG-VI /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=SDAG-VI /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 3: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=SDAG-GFX9 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 4: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX11,SDAG-GFX11 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 5: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=SDAG-GFX12 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 7: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GISEL-VI /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs -global-isel
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GISEL-VI /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
RUN: at line 8: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GISEL-GFX9 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -global-isel
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1233:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:544:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:545:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll:1348:20: error: GISEL-GFX9-NEXT: expected string not found in input
; GISEL-GFX9-NEXT: v_and_b32_sdwa v1, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
                   ^
<stdin>:589:24: note: scanning from here
 v_mov_b32_e32 v1, 0xff
                       ^
<stdin>:590:2: note: possible intended match here
 v_and_b32_sdwa v1, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
 ^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
...

frederik-h added a commit that referenced this pull request Jan 24, 2025
This PR reapplies the changes from PR #123942 which had to be reverted
because of a test failure. The test has been adjusted.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 10, 2025
…124131)

This PR reapplies the changes from PR llvm#123942 which had to be reverted
because of a test failure. The test has been adjusted.

(cherry picked from commit bfd9bc2)
frederik-h added a commit that referenced this pull request Mar 3, 2025
The si-peephole-sdwa pass adjusts the selections on sdwa instructions to
the selections on their operands during its conversions. For instance,
if an instruction selects `BYTE_0` and its operand selects `WORD_1`, the
combined selection should be `BYTE_2`, i.e. "`BYTE_0` of `WORD_1`". The
existing implementation does not always handle this correctly in some
complex situations with instructions across different basic blocks as
demonstrated by the test cases included in this PR.

This PR adds an additional selection combination step to the conversion
to fix this issue. It reverts the changes made by PR #123942 which had
disabled the conversion of preexisting SDWA instructions completely as a
quick fix.

---------

Co-authored-by: Jeffrey Byrnes <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
The si-peephole-sdwa pass adjusts the selections on sdwa instructions to
the selections on their operands during its conversions. For instance,
if an instruction selects `BYTE_0` and its operand selects `WORD_1`, the
combined selection should be `BYTE_2`, i.e. "`BYTE_0` of `WORD_1`". The
existing implementation does not always handle this correctly in some
complex situations with instructions across different basic blocks as
demonstrated by the test cases included in this PR.

This PR adds an additional selection combination step to the conversion
to fix this issue. It reverts the changes made by PR llvm#123942 which had
disabled the conversion of preexisting SDWA instructions completely as a
quick fix.

---------

Co-authored-by: Jeffrey Byrnes <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants