Skip to content

Resolve merge/04d4be501dc83fe411193a46c10e898898552731 stable 21.x #11029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: stable/21.x
Choose a base branch
from

Conversation

jkorous-apple
Copy link

No description provided.

tru and others added 18 commits July 15, 2025 15:59
This analysis currently just crashes when applied to a graph region that
has a use-def cycle. This PR fixes that by keeping track of the
operations the DFS has already visited when following use-def edges and
stopping once we visit an operation again.
Commit a629322 forced the register
class of ZPR[24]StridedOrContiguous for spills/fills of ZPR2 and ZPR4,
but this may result in issues when the regclass for the fill is a
ZPR2/ZPR4 which would allow the register allocator to pick `z1_z2`,
which is not a supported register for ZPR2StridedOrContiguous that only
supports tuples of the form (strided) `z0_z8`, `z1_z9` or (contiguous,
start at multiple of 2) `z0_z1`, `z2_z3`. For spills we could add a new
register class that supports any of the tuple forms, but I've decided
to use two pseudos similar to the fills for consistency.

Fixes llvm#148655
llvm#148824)

By finalizing the bundle _after_ copying over the implicit-ops, it also
adds any implicit-defs to the BUNDLE.

Fixes llvm#148645
This sets the cache line size to 64 for the Neoverse V2 and V3. I've
tested this with loop-interchange: it doesn't result in extra
compile-times, but it does enable a lot more interchange.
…ddrRegImm9. (llvm#148779)

To fold a FrameIndex, we need to teach eliminateFrameIndex to respect
the uimm9 range.

(cherry picked from commit 63d099a)
The transformation done in llvm#147349 was incorrect since we were not
passing the input node of the `OR` instruction to the `QC.INSBI`
instruction leading to the generated instruction doing the wrong thing.
In order to do this we first needed to add the output register to
`QC.INSBI` as being both an input and output.

The code produced after the above fix will need a copy (mv) to preserve
the register input to the OR instruction if it has more than one use
making the transformation net neutral ( `6-byte QC.E.ORI/ORAI` vs
`2-byte C.MV + 4-byte QC.INSB`I). Avoid doing the transformation if
there is more than one use of the input register to the OR instruction.

(cherry picked from commit d67d91a)
Happened to spot this while looking at libclang.map for other reasons.
clang_visitCXXMethods was added in LLVM 21, not LLVM 20.

(cherry picked from commit 116110e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants