[llvm] Don't combine repeated subnormal divisors #149333

ashermancinelli · 2025-07-17T15:16:02Z

DAGCombiner performs this rewrite:
(a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)

However, when D is subnormal, this produces a*inf and b*inf. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.

DAGCombiner performs this rewrite: (a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip) However, when D is subnormal, this produces a*inf and b*inf. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.

llvmbot · 2025-07-17T15:16:27Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Asher Mancinelli (ashermancinelli)

Changes

DAGCombiner performs this rewrite:
(a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)

However, when D is subnormal, this produces a*inf and b*inf. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.

Full diff: https://github.com/llvm/llvm-project/pull/149333.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+10)
(added) llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll (+22)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0e8e4c9618bb2..2c810c2b885d3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
   if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
     return SDValue();
 
+  // Skip if we have subnormals, multiplying with the reciprocal will introduce
+  // infinities.
+  ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
+  if (N1CFP) {
+    FPClassTest FPClass = N1CFP->getValueAPF().classify();
+    if (FPClass == fcPosSubnormal || FPClass == fcNegSubnormal) {
+      return SDValue();
+    }
+  }
+
   // Exit early if the target does not want this transform or if there can't
   // possibly be enough uses of the divisor to make the transform worthwhile.
   unsigned MinUses = TLI.combineRepeatedFPDivisors();
diff --git a/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll b/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
new file mode 100644
index 0000000000000..59246068b3597
--- /dev/null
+++ b/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
@@ -0,0 +1,22 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s | FileCheck %s
+
+; Negative test: repeated FP divisor transform should bail out when the rewrite
+; would introduce infinities because of subnormal constant divisors.
+define void @two_denorm_fdivs(float %a0, float %a1, float %a2, ptr %res) {
+; CHECK-LABEL: two_denorm_fdivs:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.95915678E-39,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    divss %xmm0, %xmm1
+; CHECK-NEXT:    movss %xmm1, (%rdi)
+; CHECK-NEXT:    divss %xmm0, %xmm2
+; CHECK-NEXT:    movss %xmm2, 4(%rdi)
+; CHECK-NEXT:    retq
+entry:
+  %div0 = fdiv ninf float %a1, 0x37E5555500000000
+  store float %div0, ptr %res
+  %ptr1 = getelementptr inbounds float, ptr %res, i64 1
+  %div1 = fdiv ninf float %a2, 0x37E5555500000000
+  store float %div1, ptr %ptr1
+  ret void
+}

andykaylor · 2025-07-17T17:11:00Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
  if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
    return SDValue();

+  // Skip if we have subnormals, multiplying with the reciprocal will introduce
+  // infinities.
+  ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);


I kind of think the arcp flag allows this. I know it's not what the user would want, but that often happens with fast math.

This change still lets the operation happen if the divisor isn't constant but is dynamically subnormal, right? It seems like if we really want to stop this, the transformation should be happening somewhere that we can call computeKnownFPClass and check for possible subnormals, which would likely disable the transformation in most cases.

The check on line 18229 above seems to be skipping the arcp check after the legalize phase. I'm not sure why that would be the case.

Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...

Thank you for the review 😄 In my reading of langref "arcp: Allows division to be treated as a multiplication by a reciprocal", arcp doesn't seem to necessitate that the compiler perform the div -> mul+recip, especially when we know at compile time that something nasty is going on (a divide by subnormal).

I see this in visitFDIV:

// Only do the transform if the reciprocal is a legal fp immediate that // isn't too nasty (eg NaN, denormal, ...). if (((st == APFloat::opOK && !Recip.isDenormal()) ||

I defer to your judgement, but if we can be a little more friendly in our constant folding, I don't see how that's a bad thing.

To your second question: yes, this doesn't do anything for preventing the same dynamically subnormal transform. I can dig around for example uses of computeKnownFPClass if you think that's a profitable direction. Thanks!

No, I definitely wouldn't say that it's a profitable direction, just that it's the only way to handle this completely. I think computeKnownFPClass will report that any non-constant value could possibly be a subnormal unless there has been an explicit check in the code to prove that it isn't, so in almost all case it would just end up disabling the optimization.

I agree that the arcp flag doesn't require the compiler to perform the optimization. There is some merit to blocking the optimization in cases where we can be reasonably certain the user wouldn't want it even if the semantics of arcp do allow it. It just doesn't sit right with me to allow the optimization on variables but not on constants, because then it depends on how well we've done constant propagation to get to this point, and you can get things like a bug in the user's program that disappears if we don't inline a certain function, for instance, which could be maddening to track down.

This bug was already quite maddening to track down 😄. Full context is that this xform on a denormal created an infinity, which became a poisoned vector lane <poison, x, x, x>, which became a broadcast x with mattr=+avx2 and a move <0, x, x, x> on mattr=-avx2, which caused the test to pass with avx2 disabled because the poison became a zero without avx2 and it took on the value of x with >=avx2. At least with a dynamically subnormal value, the transform does not end up poisoning anything and the fast-math chaos stops there. It's the pessimistic poisoning that was the most frustrating part of this.

Oh! I've seen cases before where fast-math introduced an infinity and something else declared it to be poison (https://discourse.llvm.org/t/nnan-ninf-and-poison/56506). That is no fun.

I guess I'd have to agree that it makes sense to avoid introducing an infinity when we know we're doing so, especially if ninf is in effect.

We don't have a DAG version of computeKnownFPClass, but should have one (it will also always do a significantly worse job than the IR version)

efriedma-quic · 2025-07-17T17:48:04Z

llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll

+; CHECK-NEXT:    movss %xmm2, 4(%rdi)
+; CHECK-NEXT:    retq
+entry:
+  %div0 = fdiv ninf float %a1, 0x37E5555500000000


It looks like you added the wrong test? At least, I can't reproduce the issue without additional flags.

I think you're right, I reduced this a bit too far and it doesn't repro the bug. I'll need to update this if folks are amenable to this change.

ninf should have been arcp, sorry about that. Should be updated. Thank you for the review!

efriedma-quic · 2025-07-17T17:53:07Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
  if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
    return SDValue();

+  // Skip if we have subnormals, multiplying with the reciprocal will introduce
+  // infinities.
+  ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);


Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...

andykaylor · 2025-07-17T23:02:46Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+  // Skip if we have subnormals, multiplying with the reciprocal will introduce
+  // infinities.
+  ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
+  if (N1CFP) {


You might also want to block the transformation in cases where N1CFP is a non-splat constant vector and one of the elements is subnormal.

https://godbolt.org/z/zzPT4dTP5

arsenm

To clarify, does this issue only occur when using DAZ? If you're using DAZ, the testcase here doesn't have the correct denormal-fp-math mode applied.

llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll

ashermancinelli · 2025-07-18T00:50:05Z

To clarify, does this issue only occur when using DAZ? If you're using DAZ, the testcase here doesn't have the correct denormal-fp-math mode applied.

No, it occurs with DAZ and without. The failure mode of test the program is that a relative error calculation is (roughly) folded like so:

Where relerr, actual, expect, are all len4 vectors:

relerr = abs((actual-expect)/expect)

=> transformed by repeated fp divisors
relerr = abs((actual-expect)*(1/expect))

=> one element of expect is subnormal, so 1/expect -> inf
relerr = abs((actual-expect)*<INF, x, x, x>)

=> The ninf FMF causes INF -> poison
relerr = abs((actual-expect)*<poison, x, x, x>)

=> (actual-expect)*x is folded to y for the latter 3 elements of the vector
relerr = abs(<poison, y, y, y>)

=> with avx2, that poison is optimized to y so a broadcast can be used
relerr = abs(broadcast(y))

=> at runtime, the value of y in the first lane causes a failure

When the poison makes it all the way to codegen, it becomes a 0 which is fine for the purposes of the error calculation (relerr = abs(0), which is fine) but when the broadcast optimization triggers it takes on the value of y, which is invalid, so the test fails. This is why the floating point control register doesn't really help us; the incorrect relative error calculation is in the data section already.

Does this answer your question? Apologies if I misunderstood!

arsenm

I don't think this should special case constant values. The problem still exists for the non-constant case

arsenm · 2025-07-18T01:37:46Z

I also don't think arcp should give license to introduce new infinities; perhaps this also should require reassoc

ashermancinelli · 2025-07-18T01:59:31Z

I don't think this should special case constant values. The problem still exists for the non-constant case

The problem of poison propagation still exists for the non-constant case?

arsenm · 2025-07-18T03:23:24Z

I don't think this should special case constant values. The problem still exists for the non-constant case

The problem of poison propagation still exists for the non-constant case?

Yes, the same situation will still occur at runtime. The infinity is still poison, even if not known at compile time

andykaylor · 2025-07-18T16:48:25Z

I also don't think arcp should give license to introduce new infinities; perhaps this also should require reassoc

This is a fundamental problem with mixing arcp with ninf. Any non-constant value could be a subnormal value whose reciprocal will be infinity. So we have to make a choice, either arcp gives implicit permission for a transformation that could potentially introduce an infinity where none existed previously or arcp only allows reciprocal math if the result can be proven not to be infinity. The latter would render the flag nearly useless.

I don't think this should special case constant values. The problem still exists for the non-constant case

That was my initial reaction as well, but I've come around to what @ashermancinelli is trying to achieve here. If we say that arcp allows performing a reciprocal operation that could introduce infinity, then by using it you are accepting the risk of introducing a value that is dynamically poison, but the optimizer will never know that the value is poison, and so it won't optimize based on it being poison. You may still get incorrect results, but less arbitrarily so. In the constant case, however, we know that the transformation is introducing a poison value (if the ninf flag is set on any of the operations involved), so even if it is permitted by the semantics of the IR for the optimizer to do so, we should avoid it.

If we take the alternative approach and say that we should not perform this operation in any case where it might introduce an infinity, we need to remove the optimization entirely. The same would be true of almost any arcp-based transformation.

ashermancinelli requested a review from mcinally July 17, 2025 15:16

ashermancinelli self-assigned this Jul 17, 2025

ashermancinelli added the llvm:SelectionDAG SelectionDAGISel as well label Jul 17, 2025

llvmbot added the backend:X86 label Jul 17, 2025

mcinally requested review from rotateright, efriedma-quic, andykaylor and arsenm July 17, 2025 15:19

andykaylor reviewed Jul 17, 2025

View reviewed changes

andykaylor requested a review from jcranmer-intel July 17, 2025 17:20

efriedma-quic reviewed Jul 17, 2025

View reviewed changes

Add arcp fmf to test case

0963938

andykaylor reviewed Jul 17, 2025

View reviewed changes

arsenm reviewed Jul 18, 2025

View reviewed changes

llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll Outdated Show resolved Hide resolved

remove instr verification

b08bd7a

arsenm added the floating-point Floating-point math label Jul 18, 2025

arsenm reviewed Jul 18, 2025

View reviewed changes

[llvm] Don't combine repeated subnormal divisors #149333

Are you sure you want to change the base?

[llvm] Don't combine repeated subnormal divisors #149333

Conversation

ashermancinelli commented Jul 17, 2025

Uh oh!

llvmbot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ashermancinelli commented Jul 18, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Jul 18, 2025

Uh oh!

ashermancinelli commented Jul 18, 2025

Uh oh!

arsenm commented Jul 18, 2025

Uh oh!

andykaylor commented Jul 18, 2025

Uh oh!

Uh oh!

llvmbot commented Jul 17, 2025 •

edited

Loading