Skip to content

[llvm] Don't combine repeated subnormal divisors #149333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ashermancinelli
Copy link
Contributor

DAGCombiner performs this rewrite:
(a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)

However, when D is subnormal, this produces a*inf and b*inf. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.

DAGCombiner performs this rewrite:
(a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)

However, when D is subnormal, this produces a*inf and b*inf.
With fast-math flags enabled, this creates poisons that break
the rewritten consumers. Guard this transformation with checks
for subnormal operands.
@ashermancinelli ashermancinelli requested a review from mcinally July 17, 2025 15:16
@ashermancinelli ashermancinelli self-assigned this Jul 17, 2025
@ashermancinelli ashermancinelli added the llvm:SelectionDAG SelectionDAGISel as well label Jul 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Asher Mancinelli (ashermancinelli)

Changes

DAGCombiner performs this rewrite:
(a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)

However, when D is subnormal, this produces a*inf and b*inf. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.


Full diff: https://github.com/llvm/llvm-project/pull/149333.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+10)
  • (added) llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll (+22)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0e8e4c9618bb2..2c810c2b885d3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
   if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
     return SDValue();
 
+  // Skip if we have subnormals, multiplying with the reciprocal will introduce
+  // infinities.
+  ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
+  if (N1CFP) {
+    FPClassTest FPClass = N1CFP->getValueAPF().classify();
+    if (FPClass == fcPosSubnormal || FPClass == fcNegSubnormal) {
+      return SDValue();
+    }
+  }
+
   // Exit early if the target does not want this transform or if there can't
   // possibly be enough uses of the divisor to make the transform worthwhile.
   unsigned MinUses = TLI.combineRepeatedFPDivisors();
diff --git a/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll b/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
new file mode 100644
index 0000000000000..59246068b3597
--- /dev/null
+++ b/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
@@ -0,0 +1,22 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s | FileCheck %s
+
+; Negative test: repeated FP divisor transform should bail out when the rewrite
+; would introduce infinities because of subnormal constant divisors.
+define void @two_denorm_fdivs(float %a0, float %a1, float %a2, ptr %res) {
+; CHECK-LABEL: two_denorm_fdivs:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.95915678E-39,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    divss %xmm0, %xmm1
+; CHECK-NEXT:    movss %xmm1, (%rdi)
+; CHECK-NEXT:    divss %xmm0, %xmm2
+; CHECK-NEXT:    movss %xmm2, 4(%rdi)
+; CHECK-NEXT:    retq
+entry:
+  %div0 = fdiv ninf float %a1, 0x37E5555500000000
+  store float %div0, ptr %res
+  %ptr1 = getelementptr inbounds float, ptr %res, i64 1
+  %div1 = fdiv ninf float %a2, 0x37E5555500000000
+  store float %div1, ptr %ptr1
+  ret void
+}

@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
return SDValue();

// Skip if we have subnormals, multiplying with the reciprocal will introduce
// infinities.
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of think the arcp flag allows this. I know it's not what the user would want, but that often happens with fast math.

This change still lets the operation happen if the divisor isn't constant but is dynamically subnormal, right? It seems like if we really want to stop this, the transformation should be happening somewhere that we can call computeKnownFPClass and check for possible subnormals, which would likely disable the transformation in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on line 18229 above seems to be skipping the arcp check after the legalize phase. I'm not sure why that would be the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review 😄 In my reading of langref "arcp: Allows division to be treated as a multiplication by a reciprocal", arcp doesn't seem to necessitate that the compiler perform the div -> mul+recip, especially when we know at compile time that something nasty is going on (a divide by subnormal).

I see this in visitFDIV:

    // Only do the transform if the reciprocal is a legal fp immediate that
    // isn't too nasty (eg NaN, denormal, ...).
    if (((st == APFloat::opOK && !Recip.isDenormal()) ||

I defer to your judgement, but if we can be a little more friendly in our constant folding, I don't see how that's a bad thing.

To your second question: yes, this doesn't do anything for preventing the same dynamically subnormal transform. I can dig around for example uses of computeKnownFPClass if you think that's a profitable direction. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I definitely wouldn't say that it's a profitable direction, just that it's the only way to handle this completely. I think computeKnownFPClass will report that any non-constant value could possibly be a subnormal unless there has been an explicit check in the code to prove that it isn't, so in almost all case it would just end up disabling the optimization.

I agree that the arcp flag doesn't require the compiler to perform the optimization. There is some merit to blocking the optimization in cases where we can be reasonably certain the user wouldn't want it even if the semantics of arcp do allow it. It just doesn't sit right with me to allow the optimization on variables but not on constants, because then it depends on how well we've done constant propagation to get to this point, and you can get things like a bug in the user's program that disappears if we don't inline a certain function, for instance, which could be maddening to track down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was already quite maddening to track down 😄. Full context is that this xform on a denormal created an infinity, which became a poisoned vector lane <poison, x, x, x>, which became a broadcast x with mattr=+avx2 and a move <0, x, x, x> on mattr=-avx2, which caused the test to pass with avx2 disabled because the poison became a zero without avx2 and it took on the value of x with >=avx2. At least with a dynamically subnormal value, the transform does not end up poisoning anything and the fast-math chaos stops there. It's the pessimistic poisoning that was the most frustrating part of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I've seen cases before where fast-math introduced an infinity and something else declared it to be poison (https://discourse.llvm.org/t/nnan-ninf-and-poison/56506). That is no fun.

I guess I'd have to agree that it makes sense to avoid introducing an infinity when we know we're doing so, especially if ninf is in effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a DAG version of computeKnownFPClass, but should have one (it will also always do a significantly worse job than the IR version)

; CHECK-NEXT: movss %xmm2, 4(%rdi)
; CHECK-NEXT: retq
entry:
%div0 = fdiv ninf float %a1, 0x37E5555500000000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you added the wrong test? At least, I can't reproduce the issue without additional flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, I reduced this a bit too far and it doesn't repro the bug. I'll need to update this if folks are amenable to this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ninf should have been arcp, sorry about that. Should be updated. Thank you for the review!

@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
return SDValue();

// Skip if we have subnormals, multiplying with the reciprocal will introduce
// infinities.
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...

// Skip if we have subnormals, multiplying with the reciprocal will introduce
// infinities.
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
if (N1CFP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to block the transformation in cases where N1CFP is a non-splat constant vector and one of the elements is subnormal.

https://godbolt.org/z/zzPT4dTP5

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, does this issue only occur when using DAZ? If you're using DAZ, the testcase here doesn't have the correct denormal-fp-math mode applied.

@ashermancinelli
Copy link
Contributor Author

To clarify, does this issue only occur when using DAZ? If you're using DAZ, the testcase here doesn't have the correct denormal-fp-math mode applied.

No, it occurs with DAZ and without. The failure mode of test the program is that a relative error calculation is (roughly) folded like so:

Where relerr, actual, expect, are all len4 vectors:

relerr = abs((actual-expect)/expect)

=> transformed by repeated fp divisors
relerr = abs((actual-expect)*(1/expect))

=> one element of expect is subnormal, so 1/expect -> inf
relerr = abs((actual-expect)*<INF, x, x, x>)

=> The ninf FMF causes INF -> poison
relerr = abs((actual-expect)*<poison, x, x, x>)

=> (actual-expect)*x is folded to y for the latter 3 elements of the vector
relerr = abs(<poison, y, y, y>)

=> with avx2, that poison is optimized to y so a broadcast can be used
relerr = abs(broadcast(y))

=> at runtime, the value of y in the first lane causes a failure

When the poison makes it all the way to codegen, it becomes a 0 which is fine for the purposes of the error calculation (relerr = abs(0), which is fine) but when the broadcast optimization triggers it takes on the value of y, which is invalid, so the test fails. This is why the floating point control register doesn't really help us; the incorrect relative error calculation is in the data section already.

Does this answer your question? Apologies if I misunderstood!

@arsenm arsenm added the floating-point Floating-point math label Jul 18, 2025
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should special case constant values. The problem still exists for the non-constant case

@arsenm
Copy link
Contributor

arsenm commented Jul 18, 2025

I also don't think arcp should give license to introduce new infinities; perhaps this also should require reassoc

@ashermancinelli
Copy link
Contributor Author

I don't think this should special case constant values. The problem still exists for the non-constant case

The problem of poison propagation still exists for the non-constant case?

@arsenm
Copy link
Contributor

arsenm commented Jul 18, 2025

I don't think this should special case constant values. The problem still exists for the non-constant case

The problem of poison propagation still exists for the non-constant case?

Yes, the same situation will still occur at runtime. The infinity is still poison, even if not known at compile time

@andykaylor
Copy link
Contributor

I also don't think arcp should give license to introduce new infinities; perhaps this also should require reassoc

This is a fundamental problem with mixing arcp with ninf. Any non-constant value could be a subnormal value whose reciprocal will be infinity. So we have to make a choice, either arcp gives implicit permission for a transformation that could potentially introduce an infinity where none existed previously or arcp only allows reciprocal math if the result can be proven not to be infinity. The latter would render the flag nearly useless.

I don't think this should special case constant values. The problem still exists for the non-constant case

That was my initial reaction as well, but I've come around to what @ashermancinelli is trying to achieve here. If we say that arcp allows performing a reciprocal operation that could introduce infinity, then by using it you are accepting the risk of introducing a value that is dynamically poison, but the optimizer will never know that the value is poison, and so it won't optimize based on it being poison. You may still get incorrect results, but less arbitrarily so. In the constant case, however, we know that the transformation is introducing a poison value (if the ninf flag is set on any of the operations involved), so even if it is permitted by the semantics of the IR for the optimizer to do so, we should avoid it.

If we take the alternative approach and say that we should not perform this operation in any case where it might introduce an infinity, we need to remove the optimization entirely. The same would be true of almost any arcp-based transformation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 floating-point Floating-point math llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants