Description
[JIT, x64, Tier1+PGO] Wrong opcode emitted (jb instead of jl) after test reg,reg on sign-extended int32, in a chained || comparison
TL;DR
On .NET 10.0.8 x64, in a Tier-1 + Synthesized-PGO compilation of a large method
(~3.9 KB, 113 PGO-inlined callees), RyuJIT emits test rcx, rcx; jb SHORT L
where the correct opcode is jl SHORT L. The value in rcx is a sign-extended
int32 (movsxd rcx, ecx) being compared against zero as a signed long. The
jb (unsigned-LT) branch is provably never taken after test r,r
(CF is cleared by test), so the fall-through always executes, producing a
deterministic wrong result.
Setting DOTNET_TieredPGO=0 (or "System.Runtime.TieredPGO": false in
runtimeconfig.json) is a working mitigation.
Environment
|
|
| Runtime version |
10.0.8 |
| Process arch |
x64 |
| OS |
Windows 10/11 (also affects Server, repro'd locally on Win 11 22631) |
| Tiering / PGO |
Tier1 with Synthesized PGO (also Dynamic PGO) |
| JIT name in disasm header |
BLENDED_CODE for generic X64 + VEX + EVEX on Windows |
| Method category |
Large pre-codegen'd calc method translated from a tax-calculation DSL |
| Method size |
IL size = 179, code size = 3869 bytes |
| Inlining context |
113 inlinees with PGO data; 229 single-block inlinees; 24 inlinees without PGO data |
fgCalledCount |
32 |
| Reproducer status |
Production binary reliably reproduces; |
Affected method
After source-to-C# translation from the upstream tax DSL, the method is
plain straight-line C# that performs four IndexOfAny lookups, wraps each
result in a pooled TaxNumericCell.fValue (a long), and OR-chains the
Mark.Compare(0) >= 0 predicates:
public TaxBooleanCell local_ValidateQuotationMark(TaxStringCell InQuotationMark)
{
int _BP = _B.ReserveItems(3);
TaxBooleanCell __functionResult;
TaxNumericCell Mark1, Mark2, Mark3, Mark4;
__functionResult = BCell(false);
Mark1 = ICell(FindChar(InQuotationMark.Value, "«"));
Mark2 = ICell(FindChar(InQuotationMark.Value, "»"));
Mark3 = ICell(FindChar(InQuotationMark.Value, "\u201C"));
Mark4 = ICell(FindChar(InQuotationMark.Value, "\u201D"));
if ((Mark1.Compare(0) >= 0)
|| (Mark2.Compare(0) >= 0)
|| (Mark3.Compare(0) >= 0)
|| (Mark4.Compare(0) >= 0))
{
__functionResult = BCell(true);
}
_B.ReleaseItems(3);
return __functionResult;
}
Helpers (all inlined by Tier1+PGO):
// ICell stores a 1-based "found" index or -1 into a pooled numeric cell.
private TaxNumericCell ICell(int idx)
{
int slot = _N.ReserveItems(1);
TaxNumericCell c = (TaxNumericCell)_N[slot];
c.fValue = idx >= 0 ? idx + 1 : -1; // <-- key: int-domain cmovl, then store to long field
return c;
}
private static int FindChar(string s, string needle) =>
s.IndexOfAny(needle.ToCharArray(), 0, s.Length);
// Compare is virtual; PGO-guarded devirt selects the leaf TaxIntegerCell.
public override int Compare(long other) // on TaxIntegerCell
{
long delta = fValue - other;
if (delta < 0) return -1;
if (delta > 0) return 1;
return 0;
}
After the OR-chain rewrite that the JIT applies to Compare(0) >= 0,
each predicate collapses to fValue >= 0 (signed long).
Buggy disassembly (verbatim excerpt from JitDisasm)
The full Tier1+PGO listing is 3869 bytes. The relevant region is IG82
(Mark4 computation, inline of FindChar → IndexOfAny, plus ICell),
followed by IG83–IG87 (Mark1/2/3 from stack spills, all using the
correct signed branch), then IG88 with the wrong opcode for Mark4:
G_M000_IG82:
mov r9d, dword ptr [rsi+0x08]
mov rcx, rsi
xor r8d, r8d
call [System.String:IndexOfAny(char[],int,int):int:this]
lea ecx, [rax+0x01] ; ecx = rax + 1
mov edx, -1
test eax, eax
cmovl ecx, edx ; ecx = (rax<0) ? -1 : rax+1
movsxd rcx, ecx ; rcx = sign-extend(ecx) -- Mark4.fValue
cmp qword ptr [rsp+0xF0], 0 ; Mark1.fValue (stack spill)
je SHORT G_M000_IG89
G_M000_IG83:
cmp qword ptr [rsp+0xF0], 0
jg SHORT G_M000_IG89 ; correct: jg (signed)
G_M000_IG84:
mov rax, qword ptr [rsp+0xB0] ; Mark2.fValue
test rax, rax
je SHORT G_M000_IG89
G_M000_IG85:
test rax, rax
jg SHORT G_M000_IG89 ; correct: jg (signed)
G_M000_IG86:
mov rax, qword ptr [rsp+0x70] ; Mark3.fValue
test rax, rax
je SHORT G_M000_IG89
G_M000_IG87:
test rax, rax
jg SHORT G_M000_IG89 ; correct: jg (signed)
G_M000_IG88:
test rcx, rcx ; Mark4 still live in rcx from IG82
jb SHORT G_M000_IG90 ; *** BUG: jb (unsigned-LT) ***
; expected: jg / jge / equivalent signed test
G_M000_IG89:
mov rcx, 0xD1FFAB1E
call CORINFO_HELP_NEWSFAST ; allocate TaxBooleanCell(true)
...
call [Cch.Data.TaxBooleanCell:Assign(bool):this] ; result = true
G_M000_IG90:
cmp gword ptr [rbx+0x30], 0
...
Why this is wrong
test rcx, rcx clears CF unconditionally (Intel SDM, vol. 2, TEST).
Therefore jb (which branches on CF=1) is never taken.
- The predicate being lowered is
Mark4.fValue >= 0. The signed-LT branch
(jl) over the "set result = true" block is the standard lowering. The JIT
picked the unsigned-LT opcode instead.
- Consequence: whenever Mark1, Mark2, Mark3 all evaluate to
false
(the overwhelmingly hot path — strings without quotation marks), control
falls through into IG89 regardless of Mark4's actual value, and
result = true is unconditionally assigned.
Why this only affects Mark4 (and not Mark1/2/3)
Mark1/2/3 went through a stack spill (mov [rsp+offset], rax upstream;
mov rax, [rsp+offset]; test rax, rax; jg). After the memory round-trip
the JIT correctly emits the signed branch.
Mark4 stays register-resident in rcx from the movsxd rcx, ecx to the test.
Our hypothesis is that when the JIT decides the operand type at the
test r,r; jcc site, it consults a stale per-tree type tag that says
"this value is a sign-extended int32 whose sign was tested" — and from that
incorrectly concludes jb (the unsigned form). After a store/load the tag
is reset and jl is emitted correctly.
Impact
This affects a Canadian corporate tax (T2) calculation product shipping on
.NET 10. The method involved is a hot validator called once per quotation-mark
text field per return; the wrong-result manifests as spurious validation
errors on every return processed after the method tiers up to Tier1+PGO.
We are shipping the TieredPGO=0 mitigation in our runtimeconfig.json as
a stopgap.
Categorization hints
- Area:
area-CodeGen-coreclr
- Component: opcode selection / branch lowering for
GT_LT / GT_GE over
sign-extended int32 → long values in PGO-driven Tier1 codegen.
Reproduction Steps
Reproduction
What we have today
- The full assembly listing (
DOTNET_JitDisasm=local_ValidateQuotationMark,
DOTNET_JitDisasmSummary=1) is 3869 bytes and contains the buggy IG88
block shown above, every run, on .NET 10.0.8 x64 with PGO enabled.
- Functional repro: were unable to create a reliable minimal app that
reproduces the issue.
return true (wrong — expected false).
DOTNET_TieredPGO=0 makes the bug disappear (only the Tier1-without-PGO
codegen is produced, which uses jl).
DOTNET_JitNoInline=1 also makes the bug disappear — confirming the
miscompilation is inliner-driven: the wrong opcode only emerges once
the JIT has inlined ICell + FindChar + Compare into
local_ValidateQuotationMark, producing the cmovl + movsxd →
register-resident-through-OR-chain sequence shown at IG82–IG88. With
inlining disabled, each helper is a real call, the value round-trips
through memory, and the correct signed branch is emitted.
Minimal repro status — help wanted
We have attempted a minimal synthetic harness that mirrors the production
shape (4-level abstract Cell hierarchy, pooled allocator, PGO-driven
guarded devirt of Compare, 5M-call warmup, identical OR-chain), but
have not yet been able to trigger the bad codegen in isolation. In the
synthetic harness, the JIT spills Mark4 to the stack just like the other
three marks, and emits jg correctly.
It appears the production trigger requires the full inlining context
(113 inlinees, ~3.9 KB body) — specifically, enough register pressure for
the JIT to keep Mark4 register-resident from movsxd through the entire
OR-chain, and enough non-escaping cell allocations for the JIT to elide
the fValue store.
We are happy to:
- Provide the complete production disassembly of the method
(~28 KB text excerpt from DOTNET_JitDisasm).
- Provide the complete jitDump (with
DOTNET_JitDump=local_ValidateQuotationMark).
- Run any specific
DOTNET_* instrumentation env vars you'd like and ship
the output back.
If the team needs a directly buildable repro, we can investigate whether the
proprietary calc binary can be shared under NDA, or whether a redistributable
extract of the generated Utils.cs file is sufficient.
Expected behavior
Expected behavior
test rcx, rcx; jl SHORT G_M000_IG90 (or any signed-LT equivalent) at IG88,
matching the codegen produced for Mark1/2/3 at IG83/IG85/IG87.
Actual behavior
Actual behavior
test rcx, rcx; jb SHORT G_M000_IG90 — opcode is unsigned-LT, branch is
provably dead, fall-through unconditionally executes the
__functionResult = BCell(true) block.
Regression?
No response
Known Workarounds
Workaround (confirmed)
Either:
or in runtimeconfig.json:
{
"runtimeOptions": {
"configProperties": {
"System.Runtime.TieredPGO": false
}
}
}
Disabling tiering altogether (`DOTNET_TieredCompilation=0`) also works.
Static PGO (PGO-instrumented IL with `PgoEnabled=true`) was not tested.
DOTNET_JitNoInline=1 also made the issue disappear if that helps.
We elected to disable the tieredPGO via the runtimeconfig.template.json included with our applications pending a fix.
### Configuration
_No response_
### Other information
We have attached some output from an optimized run (disasm-tier1-pgo.txt, exhibited the issue), one that was not optimized for comparison (disasm-tier1-no-pgo.txt, pgo=0) along with various elements of information such as the installed .net environment information (dotnet-info 1.txt), the runtimeconfig (T2Txp.runtimeconfig.json) of the application, and excerpt of the csharp function (Utils.cs-excerpt.txt) that triggered the issue along with a raw jitDump with pgo enabled (jitdump-raw 1.txt) contaned in the attached zip file
[Evidence.zip](https://github.com/user-attachments/files/28509422/Evidence.zip)
Description
[JIT, x64, Tier1+PGO] Wrong opcode emitted (
jbinstead ofjl) aftertest reg,regon sign-extended int32, in a chained||comparisonTL;DR
On .NET 10.0.8 x64, in a Tier-1 + Synthesized-PGO compilation of a large method
(~3.9 KB, 113 PGO-inlined callees), RyuJIT emits
test rcx, rcx; jb SHORT Lwhere the correct opcode is
jl SHORT L. The value inrcxis a sign-extendedint32(movsxd rcx, ecx) being compared against zero as a signedlong. Thejb(unsigned-LT) branch is provably never taken aftertest r,r(CF is cleared by
test), so the fall-through always executes, producing adeterministic wrong result.
Setting
DOTNET_TieredPGO=0(or"System.Runtime.TieredPGO": falseinruntimeconfig.json) is a working mitigation.
Environment
BLENDED_CODE for generic X64 + VEX + EVEX on WindowsfgCalledCountAffected method
After source-to-C# translation from the upstream tax DSL, the method is
plain straight-line C# that performs four
IndexOfAnylookups, wraps eachresult in a pooled
TaxNumericCell.fValue(along), and OR-chains theMark.Compare(0) >= 0predicates:Helpers (all inlined by Tier1+PGO):
After the OR-chain rewrite that the JIT applies to
Compare(0) >= 0,each predicate collapses to
fValue >= 0(signedlong).Buggy disassembly (verbatim excerpt from JitDisasm)
The full Tier1+PGO listing is 3869 bytes. The relevant region is IG82
(Mark4 computation, inline of
FindChar→IndexOfAny, plusICell),followed by IG83–IG87 (Mark1/2/3 from stack spills, all using the
correct signed branch), then IG88 with the wrong opcode for Mark4:
Why this is wrong
test rcx, rcxclearsCFunconditionally (Intel SDM, vol. 2,TEST).Therefore
jb(which branches onCF=1) is never taken.Mark4.fValue >= 0. The signed-LT branch(
jl) over the "set result = true" block is the standard lowering. The JITpicked the unsigned-LT opcode instead.
false(the overwhelmingly hot path — strings without quotation marks), control
falls through into
IG89regardless of Mark4's actual value, andresult = trueis unconditionally assigned.Why this only affects Mark4 (and not Mark1/2/3)
Mark1/2/3 went through a stack spill (
mov [rsp+offset], raxupstream;mov rax, [rsp+offset]; test rax, rax; jg). After the memory round-tripthe JIT correctly emits the signed branch.
Mark4 stays register-resident in
rcxfrom themovsxd rcx, ecxto the test.Our hypothesis is that when the JIT decides the operand type at the
test r,r; jccsite, it consults a stale per-tree type tag that says"this value is a sign-extended
int32whose sign was tested" — and from thatincorrectly concludes
jb(the unsigned form). After a store/load the tagis reset and
jlis emitted correctly.Impact
This affects a Canadian corporate tax (T2) calculation product shipping on
.NET 10. The method involved is a hot validator called once per quotation-mark
text field per return; the wrong-result manifests as spurious validation
errors on every return processed after the method tiers up to Tier1+PGO.
We are shipping the
TieredPGO=0mitigation in ourruntimeconfig.jsonasa stopgap.
Categorization hints
area-CodeGen-coreclrGT_LT/GT_GEoversign-extended
int32→longvalues in PGO-driven Tier1 codegen.Reproduction Steps
Reproduction
What we have today
DOTNET_JitDisasm=local_ValidateQuotationMark,DOTNET_JitDisasmSummary=1) is 3869 bytes and contains the buggy IG88block shown above, every run, on .NET 10.0.8 x64 with PGO enabled.
reproduces the issue.
return
true(wrong — expectedfalse).DOTNET_TieredPGO=0makes the bug disappear (only the Tier1-without-PGOcodegen is produced, which uses
jl).DOTNET_JitNoInline=1also makes the bug disappear — confirming themiscompilation is inliner-driven: the wrong opcode only emerges once
the JIT has inlined
ICell+FindChar+Compareintolocal_ValidateQuotationMark, producing thecmovl+movsxd→register-resident-through-OR-chain sequence shown at IG82–IG88. With
inlining disabled, each helper is a real call, the value round-trips
through memory, and the correct signed branch is emitted.
Minimal repro status — help wanted
We have attempted a minimal synthetic harness that mirrors the production
shape (4-level abstract Cell hierarchy, pooled allocator, PGO-driven
guarded devirt of
Compare, 5M-call warmup, identical OR-chain), buthave not yet been able to trigger the bad codegen in isolation. In the
synthetic harness, the JIT spills Mark4 to the stack just like the other
three marks, and emits
jgcorrectly.It appears the production trigger requires the full inlining context
(113 inlinees, ~3.9 KB body) — specifically, enough register pressure for
the JIT to keep Mark4 register-resident from
movsxdthrough the entireOR-chain, and enough non-escaping cell allocations for the JIT to elide
the
fValuestore.We are happy to:
(~28 KB text excerpt from
DOTNET_JitDisasm).DOTNET_JitDump=local_ValidateQuotationMark).DOTNET_*instrumentation env vars you'd like and shipthe output back.
If the team needs a directly buildable repro, we can investigate whether the
proprietary calc binary can be shared under NDA, or whether a redistributable
extract of the generated
Utils.csfile is sufficient.Expected behavior
Expected behavior
test rcx, rcx; jl SHORT G_M000_IG90(or any signed-LT equivalent) at IG88,matching the codegen produced for Mark1/2/3 at IG83/IG85/IG87.
Actual behavior
Actual behavior
test rcx, rcx; jb SHORT G_M000_IG90— opcode is unsigned-LT, branch isprovably dead, fall-through unconditionally executes the
__functionResult = BCell(true)block.Regression?
No response
Known Workarounds
Workaround (confirmed)
Either:
or in
runtimeconfig.json:{ "runtimeOptions": { "configProperties": { "System.Runtime.TieredPGO": false } } } Disabling tiering altogether (`DOTNET_TieredCompilation=0`) also works. Static PGO (PGO-instrumented IL with `PgoEnabled=true`) was not tested. DOTNET_JitNoInline=1 also made the issue disappear if that helps. We elected to disable the tieredPGO via the runtimeconfig.template.json included with our applications pending a fix. ### Configuration _No response_ ### Other information We have attached some output from an optimized run (disasm-tier1-pgo.txt, exhibited the issue), one that was not optimized for comparison (disasm-tier1-no-pgo.txt, pgo=0) along with various elements of information such as the installed .net environment information (dotnet-info 1.txt), the runtimeconfig (T2Txp.runtimeconfig.json) of the application, and excerpt of the csharp function (Utils.cs-excerpt.txt) that triggered the issue along with a raw jitDump with pgo enabled (jitdump-raw 1.txt) contaned in the attached zip file [Evidence.zip](https://github.com/user-attachments/files/28509422/Evidence.zip)