-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[llvm] Don't combine repeated subnormal divisors #149333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[llvm] Don't combine repeated subnormal divisors #149333
Conversation
DAGCombiner performs this rewrite: (a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip) However, when D is subnormal, this produces a*inf and b*inf. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-selectiondag Author: Asher Mancinelli (ashermancinelli) ChangesDAGCombiner performs this rewrite: However, when D is subnormal, this produces Full diff: https://github.com/llvm/llvm-project/pull/149333.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0e8e4c9618bb2..2c810c2b885d3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
return SDValue();
+ // Skip if we have subnormals, multiplying with the reciprocal will introduce
+ // infinities.
+ ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
+ if (N1CFP) {
+ FPClassTest FPClass = N1CFP->getValueAPF().classify();
+ if (FPClass == fcPosSubnormal || FPClass == fcNegSubnormal) {
+ return SDValue();
+ }
+ }
+
// Exit early if the target does not want this transform or if there can't
// possibly be enough uses of the divisor to make the transform worthwhile.
unsigned MinUses = TLI.combineRepeatedFPDivisors();
diff --git a/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll b/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
new file mode 100644
index 0000000000000..59246068b3597
--- /dev/null
+++ b/llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
@@ -0,0 +1,22 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s | FileCheck %s
+
+; Negative test: repeated FP divisor transform should bail out when the rewrite
+; would introduce infinities because of subnormal constant divisors.
+define void @two_denorm_fdivs(float %a0, float %a1, float %a2, ptr %res) {
+; CHECK-LABEL: two_denorm_fdivs:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: movss {{.*#+}} xmm0 = [1.95915678E-39,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT: divss %xmm0, %xmm1
+; CHECK-NEXT: movss %xmm1, (%rdi)
+; CHECK-NEXT: divss %xmm0, %xmm2
+; CHECK-NEXT: movss %xmm2, 4(%rdi)
+; CHECK-NEXT: retq
+entry:
+ %div0 = fdiv ninf float %a1, 0x37E5555500000000
+ store float %div0, ptr %res
+ %ptr1 = getelementptr inbounds float, ptr %res, i64 1
+ %div1 = fdiv ninf float %a2, 0x37E5555500000000
+ store float %div1, ptr %ptr1
+ ret void
+}
|
@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) { | |||
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0))) | |||
return SDValue(); | |||
|
|||
// Skip if we have subnormals, multiplying with the reciprocal will introduce | |||
// infinities. | |||
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of think the arcp
flag allows this. I know it's not what the user would want, but that often happens with fast math.
This change still lets the operation happen if the divisor isn't constant but is dynamically subnormal, right? It seems like if we really want to stop this, the transformation should be happening somewhere that we can call computeKnownFPClass
and check for possible subnormals, which would likely disable the transformation in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check on line 18229 above seems to be skipping the arcp
check after the legalize phase. I'm not sure why that would be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review 😄 In my reading of langref "arcp: Allows division to be treated as a multiplication by a reciprocal", arcp
doesn't seem to necessitate that the compiler perform the div -> mul+recip, especially when we know at compile time that something nasty is going on (a divide by subnormal).
I see this in visitFDIV
:
// Only do the transform if the reciprocal is a legal fp immediate that
// isn't too nasty (eg NaN, denormal, ...).
if (((st == APFloat::opOK && !Recip.isDenormal()) ||
I defer to your judgement, but if we can be a little more friendly in our constant folding, I don't see how that's a bad thing.
To your second question: yes, this doesn't do anything for preventing the same dynamically subnormal transform. I can dig around for example uses of computeKnownFPClass
if you think that's a profitable direction. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I definitely wouldn't say that it's a profitable direction, just that it's the only way to handle this completely. I think computeKnownFPClass
will report that any non-constant value could possibly be a subnormal unless there has been an explicit check in the code to prove that it isn't, so in almost all case it would just end up disabling the optimization.
I agree that the arcp
flag doesn't require the compiler to perform the optimization. There is some merit to blocking the optimization in cases where we can be reasonably certain the user wouldn't want it even if the semantics of arcp
do allow it. It just doesn't sit right with me to allow the optimization on variables but not on constants, because then it depends on how well we've done constant propagation to get to this point, and you can get things like a bug in the user's program that disappears if we don't inline a certain function, for instance, which could be maddening to track down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was already quite maddening to track down 😄. Full context is that this xform on a denormal created an infinity, which became a poisoned vector lane <poison, x, x, x>
, which became a broadcast x
with mattr=+avx2 and a move <0, x, x, x>
on mattr=-avx2, which caused the test to pass with avx2 disabled because the poison became a zero without avx2 and it took on the value of x
with >=avx2. At least with a dynamically subnormal value, the transform does not end up poisoning anything and the fast-math chaos stops there. It's the pessimistic poisoning that was the most frustrating part of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I've seen cases before where fast-math introduced an infinity and something else declared it to be poison (https://discourse.llvm.org/t/nnan-ninf-and-poison/56506). That is no fun.
I guess I'd have to agree that it makes sense to avoid introducing an infinity when we know we're doing so, especially if ninf
is in effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a DAG version of computeKnownFPClass, but should have one (it will also always do a significantly worse job than the IR version)
; CHECK-NEXT: movss %xmm2, 4(%rdi) | ||
; CHECK-NEXT: retq | ||
entry: | ||
%div0 = fdiv ninf float %a1, 0x37E5555500000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you added the wrong test? At least, I can't reproduce the issue without additional flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, I reduced this a bit too far and it doesn't repro the bug. I'll need to update this if folks are amenable to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ninf should have been arcp, sorry about that. Should be updated. Thank you for the review!
@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) { | |||
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0))) | |||
return SDValue(); | |||
|
|||
// Skip if we have subnormals, multiplying with the reciprocal will introduce | |||
// infinities. | |||
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...
// Skip if we have subnormals, multiplying with the reciprocal will introduce | ||
// infinities. | ||
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true); | ||
if (N1CFP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also want to block the transformation in cases where N1CFP is a non-splat constant vector and one of the elements is subnormal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, does this issue only occur when using DAZ? If you're using DAZ, the testcase here doesn't have the correct denormal-fp-math mode applied.
No, it occurs with DAZ and without. The failure mode of test the program is that a relative error calculation is (roughly) folded like so:
When the poison makes it all the way to codegen, it becomes a 0 which is fine for the purposes of the error calculation ( Does this answer your question? Apologies if I misunderstood! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should special case constant values. The problem still exists for the non-constant case
I also don't think arcp should give license to introduce new infinities; perhaps this also should require reassoc |
The problem of poison propagation still exists for the non-constant case? |
Yes, the same situation will still occur at runtime. The infinity is still poison, even if not known at compile time |
This is a fundamental problem with mixing
That was my initial reaction as well, but I've come around to what @ashermancinelli is trying to achieve here. If we say that If we take the alternative approach and say that we should not perform this operation in any case where it might introduce an infinity, we need to remove the optimization entirely. The same would be true of almost any arcp-based transformation. |
DAGCombiner performs this rewrite:
(a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)
However, when D is subnormal, this produces
a*inf
andb*inf
. With fast-math flags enabled, this creates poisons that break the rewritten consumers. Guard this transformation with checks for subnormal operands.