[MCLinker] Add MCLinker for parallel llvm compilation linking. #1

weiweichen · 2025-03-25T20:28:47Z

No description provided.

…d A520 (llvm#132246) Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient.

This patch introduces SelectionDAGGenTargetInfo and SDNodeInfo classes, which provide methods for accessing the generated SDNode descriptions. Pull Request: llvm#125358 Draft PR: llvm#119709 RFC: https://discourse.llvm.org/t/rfc-tablegen-erating-sdnode-descriptions

…rface. NFC. The proper layering here is that Inliner depends on InlinerUtils, and not the other way round. Maybe it's time to give InliningUtils a less terrible file name.

This reverts commit 46a2f41. Recommits 2fd6f8f with corresponding VPlan change to ensure LoopInfo is updated for all blocks during VPlan execution if needed.

This used to be under !NDEBUG before 0a17427, so just put that back. The code only consists of assertions.

Remove some legacy DT updates. Those should already be handled when updating the DT during VPlan execution.

Currently Changed is not updated properly when transposes are optimized, causing missing analysis invalidation. Update optimizeTransposes to indicate if changes have been made.

…m#104705) For 'p' the added wording matches the implementation. For 'i', 'f', 'v' the implementation also allows 0 for `<pref>` component, making 'i16:16:0' valid, for example. 'Fi0', 'Fn0' and 'S0' are also currently accepted. This is likely unintentional. There are no tests in the codebase that rely on this behavior, so the added wording prohibits zero alignments for these specifications. For 'a', the implementation currently allows 0 for both `<abi>` and `<pref>` components. The added wording prohibits specifying zero for `<pref>` with the same justification as above. Zero `<abi>` is used in tests, and the example at the end of the section suggests that this is valid, so that's left unchanged. This effectively prohibits zero alignments everywhere except for the `<abi>` component of aggregate specification.

…ndling Mach-O's ARM and X86 writers use MCExpr's `SectionAddrMap *Addrs` argument to compute label differences, which was a bit of a hack. The AArch64MachObjectWriter does this better by using `getSymbolAddress` in its `recordRelocation` function. This commit: 1. Moves the `SectionAddrMap` logic into the Mach-O code, removing the workaround. 2. Fixes a bug in `MachObjectWriter::getSymbolAddress` where it failed to subtract the `SymB` value. This bug has been present since commit b200f93 (2011).

Mach-O's ARM and X86 writers use MCExpr's `SectionAddrMap *Addrs` argument to compute label differences, which was a bit of a hack. The hack has been cleaned up by commit 1b7759de8e6979dda2d949b1ba1c742922e5c366.

Add test case showing mis-compile due to unrolling vector-pointer recipes after 6b98134.

This information is only needed assembly time and we can get it with Asm->getContext().getAsmInfo()->hasSubsectionsViaSymbols().

Do not narrow interleave groups if there are VectorPointer recipes and the plan was unrolled. The recipe implicitly uses VF from VPTransformState.

…#134517) Deprecates the methods and schedules them for removal in the future as the overloads taking LLVMContext are preferred, as the pointee type has no meaning in opaque pointers. From what my clangd can tell, there are no usages left in the monorepo Part of llvm#123569

This fixes a couple of mistakes introduced when merging llvm#132748 Fixes msan failure reported here: llvm#132748 (comment)

Even though #dbg_declare can only describe pointers, one of the OCaml tests tried to add a #dbg_declare to an i32 argument. The change introduced in ecd4c08 caught this incorrect usage.

…ck: add support of `bind` functions. (llvm#132635) Improve `bugprone-capturing-this-in-member-variable` check: Added support of `bind`-like functions that capture and store `this` pointer in class member. Closes llvm#131220.

…rrent when inside cuf kernel directive. (llvm#134467) Delete duplicated creation of hlfir.declare op of do concurrent induction variables when inside cuf kernel directive. Obtain the correct hlfir.declare op generated from bindSymbol, and add it to ivValues.

….cpp

These only differed in quoting the passes argument or not. There is further redundancy in some of these tests, but they split the invocation across multiple opt runs

This commit extends the LLVM dialect inliner interface to respect the call op's noinline attribute. This is a follow-up to llvm#133726 which added the noinline attribute to the LLVM dialect call op.

We get a lot of issues that basically boil down to "I passed malformed LLVM IR to clang and it crashed". Clang does not perform IR verification by default in (non-assertion-enabled) release builds, and that's sensible for IR that Clang itself produces, which is expected to always be valid. However, if people pass in their own handwritten IR, we should report if it is malformed, instead of crashing. We should also report it in a way that does not produce a crash trace and ask for a bug report, as currently happens in assertions-enabled builds. This aligns the behavior with how opt/llc work.

Partially fixes llvm#134480

Fixes llvm#134480

The component diagnostic headers (i.e. `DiagnosticAST.h` and friends) all follow the same format, and there’s enough of them (and in them) to where updating all of them has become rather tedious (at least it was for me while working on llvm#132348), so this patch instead generates all of them (or rather their contents) via Tablegen. Also, it seems that `%enum_select` currently wouldn’t work in `DiagnosticCommonKinds.td` because the infrastructure for that was missing from `DiagnosticIDs.h`; this patch should fix that as well.

…vm#134364) If an attribute is not defined earlier in the same file, but just referenced from its dialect directly, then currently not the correct check is being emited. What would it emit for #toy.shape<[1, 2, 3]>: Earlier: // CHECK: #[['?']]<[1, 2, 3]> Now: // CHECK: #toy.shape<[1, 2, 3]>

…-char-ostream-output (llvm#134868)

(llvm#134800) Fixes llvm#134070

… files

…src version (llvm#133015)" (llvm#134871) This reverts commit d1a0572. There was further discussion on the PR about whether the intinsics should exist in this form.

…lvm#134676) Currently in Reassociate we may create a set of new instructions when optimizing an `add`, but we do not set DebugLocs on the new instructions; this patch propagates the add's DebugLoc to the new instructions. Found using llvm#107279.

Following PR llvm#132569 (RISC-V), which added `parseDataExpr` for parsing expressions in data directives (e.g., `.word`), this PR migrates AArch64 `@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround to `parseDataExpr`. The goal is to align with the GNU assembler model, where relocation specifiers apply to the entire operand rather than individual terms, reducing complexity-especially evident in `@AUTH` parsing. Note: AArch64 ELF lacks an official syntax for data directives (llvm#132570). A prefix notation might be a preferable future direction. I recommend `%specifier(expr)`. AsmParser's `@specifier` parsing is suboptimal, necessitating lexer workarounds. `@` might appear multiple times in an operand. We should not use `@` beyond the existing AArch64 Mach-O instruction operands. In the test elf-reloc-ptrauth.s, many errors are now reported at parse time. Pull Request: llvm#134202

This reverts commit b0cb672. Breaks bot

…lvm#111964) This macro isn't required if we define all the functions inline. In fact, quite a few of the marked functions have already been inlined. This patch basically only moves code around and adds `_LIBCPP_HIDE_FROM_ABI` to the places where it's been missing so far. This also removes inlining hints, since it dropps `inline` in some places, but that shouldn't make much of a difference. The functions tend to be either really small, so should be inlined anyways, or are big enough that they shouldn't be inlined even with an inlinehint.

TestCases/Linux/asan_rt_confict_test-2.cpp started failing in https://lab.llvm.org/buildbot/#/builders/66/builds/12265/steps/9/logs/stdio The only change is "[LLD][ELF] Allow merging XO and RX sections, and add --[no-]xosegment flag (llvm#132412)" (llvm@2c1bdd4). Based on the test case (which deliberately tries to mix static and dynamically linked ASan), I suspect it's actually the test case that needs to be fixed (probably with a different error message check). This patch disables TestCases/Linux/asan_rt_confict_test-2.cpp to make the buildbots green while I investigate.

The `F_no_mmap` flag was introduced by llvm@6814232

…ector shuffles on AVX2+ targets (llvm#134849) When combining 2 x 128-bit subvectors, don't assume that if the node is already a X86ISD::VPERM2X128 node then there's nothing to do. Fix issue where if we'd somehow combined to X86ISD::VPERM2X128 (typically if the 2 operands had then simplified to a common operand), we can't canonicalise back to X86ISD::VPERMI on AVX2+ targets. This matches the v4f64/v4i64 shuffle lowering preference for VPERMQ/PD over VPERM2F128/I128.

llvm#134679) As part of RemoveFactorFromExpression, we attempt to remove a factor from a mul/fmul expression; this may involve generating new instructions, e.g. to negate the result if the factor was negative in the original expression. When this happens, the new instructions should have a DebugLoc set from the instruction that the factored expression is being used to compute. Found using llvm#107279.

llvm#129307) I recently received an internal error report that LLDB was OOM'ing when creating a Minidump. In my 64b refactor we made a decision to acquire buffers the size of the largest memory region so we could read all of the contents in one call. This made error handling very simple (and simpler coding for me!) but had the trade off of large allocations if huge pages were enabled. This patch is one I've had on the back burner for awhile, but we can read and write the Minidump memory sections in discrete chunks which we already do for writing to disk. I had to refactor the error handling a bit, but it remains the same. We make a best effort attempt to read as much of the memory region as possible, but fail immediately if we receive an error writing to disk. I did not add new tests for this because our existing test suite is quite good, but I did manually verify a few Minidumps couldn't read beyond the red_zone. ``` (lldb) reg read $sp rsp = 0x00007fffffffc3b0 (lldb) p/x 0x00007fffffffc3b0 - 128 (long) 0x00007fffffffc330 (lldb) memory read 0x00007fffffffc330 0x7fffffffc330: 60 c3 ff ff ff 7f 00 00 60 cd ff ff ff 7f 00 00 `.......`....... 0x7fffffffc340: 60 c3 ff ff ff 7f 00 00 65 e6 26 00 00 00 00 00 `.......e.&..... (lldb) memory read 0x00007fffffffc329 error: could not parse memory info (Success!) ``` I'm not sure how to quantify the memory improvement other than we would allocate the largest size regardless of the size. So a 2gb unreadable region would cause a 2gb allocation even if we were reading 4096 kb. Now we will take the range size or the max chunk size of 128 mb.

This reverts commit 39ace8a. while investigating Linux bot failures.

As discussed in llvm#109284 (comment): Changed `__msan_va_arg_overflow_size_tls` type from `Int64Ty` to `IntPtrTy`.

This patch adds support for comparison operators with ClangIR, both integral and floating point. --------- Co-authored-by: Morris Hafner <[email protected]> Co-authored-by: Henrich Lauko <[email protected]> Co-authored-by: Andy Kaylor <[email protected]>

…, NFC Reviewers: RKSimon, hiraditya Reviewed By: hiraditya, RKSimon Pull Request: llvm#134873

Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: llvm#134876

This is the first of a few patches that will do infrastructure work to enable the OpenACC lowering via the OpenACC dialect. At the moment this just gets the various function calls that will end up generating OpenACC, plus some tests to validate that we're doing the diagnostics in OpenACC specific locations. Additionally, this adds Stmt and Decl files for CIRGen.

…eic/mclinker

weiweichen changed the title ~~Weiweic/mclinker~~ [MCLinker] Add MCLinker for parallel llvm compilation linking. Mar 25, 2025

s-barannikov and others added 28 commits April 6, 2025 13:14

[mlir][inliner] Move callback types from InlinerConfig -> InlinerInte…

3e08dcd

…rface. NFC. The proper layering here is that Inliner depends on InlinerUtils, and not the other way round. Maybe it's time to give InliningUtils a less terrible file name.

Reapply "[LV] Don't add blocks to loop in GeneratedRTChecks (NFC)."

283a78a

This reverts commit 46a2f41. Recommits 2fd6f8f with corresponding VPlan change to ensure LoopInfo is updated for all blocks during VPlan execution if needed.

[EarlyCSE] Re-generate checks for intrinsics.ll.

ba3fa39

[AArch64] Avoid unused variable warnings in release builds

0defd83

This used to be under !NDEBUG before 0a17427, so just put that back. The code only consists of assertions.

[LV] Remove more DT updates from legacy code path (NFCI).

449e2f5

Remove some legacy DT updates. Those should already be handled when updating the DT during VPlan execution.

[Matrix] Properly set Changed status when optimizing transposes.

48441cb

Currently Changed is not updated properly when transposes are optimized, causing missing analysis invalidation. Update optimizeTransposes to indicate if changes have been made.

MCExpr: Remove unused SectionAddrMap workaround

b90a926

Mach-O's ARM and X86 writers use MCExpr's `SectionAddrMap *Addrs` argument to compute label differences, which was a bit of a hack. The hack has been cleaned up by commit 1b7759de8e6979dda2d949b1ba1c742922e5c366.

[LV] Add test for mis-compile when narrowing interleave groups.

12a377e

Add test case showing mis-compile due to unrolling vector-pointer recipes after 6b98134.

MCSymbolRefExpr: Remove HasSubsectionsViaSymbolsBit

768ccf6

This information is only needed assembly time and we can get it with Asm->getContext().getAsmInfo()->hasSubsectionsViaSymbols().

[VPlan] Don't narrow interleave groups if there are vector pointers.

464286b

Do not narrow interleave groups if there are VectorPointer recipes and the plan was unrolled. The recipe implicitly uses VF from VPTransformState.

[X86][AVX10] Make warning message more informative, NFCI (llvm#134528)

f2987f2

[clang] fix serialization of SubstNonTypeTemplateParmExpr (llvm#134560)

aef000d

This fixes a couple of mistakes introduced when merging llvm#132748 Fixes msan failure reported here: llvm#132748 (comment)

[OCaml] Fix test with invalid usage of #dbg_declare (llvm#134508)

c9497a2

Even though #dbg_declare can only describe pointers, one of the OCaml tests tried to add a #dbg_declare to an i32 argument. The change introduced in ecd4c08 caught this incorrect usage.

[clang] NFC: clean trailing whitespaces in clang/test/CXX/drs/cwg15xx…

6ce0fd7

….cpp

IR: Fix typo in unreachable message

0d68bad

NaryReassociate: Remove redundant run lines

e90d40a

These only differed in quoting the passes argument or not. There is further redundancy in some of these tests, but they split the invocation across multiple opt runs

[mlir][llvm] Respect call noinline attr in inliner (llvm#134493)

d9ccfd7

This commit extends the LLVM dialect inliner interface to respect the call op's noinline attribute. This is a follow-up to llvm#133726 which added the noinline attribute to the LLVM dialect call op.

[CSKY] Simplify shouldForceRelocation with MCValue::Specifier

f280d60

[clang][analyzer] Fix a possible crash in CastSizeChecker (llvm#134387)

31ef7ac

IR: Use poison in dropDroppableUse (llvm#134576)

7b3b4a5

Value: Remove redundant removeFromList in dropDroppableUse (llvm#134580)

4a5ff3e

arsenm and others added 30 commits April 8, 2025 22:12

Attributor: Propagate align to atomicrmw instructions (llvm#134837)

66f0343

Partially fixes llvm#134480

Attributor: Propagate align to cmpxchg instructions (llvm#134838)

34e8f00

Fixes llvm#134480

[llvm][bazel] Fix BUILD after 5615061.

4e9cfcf

[clang-tidy][NFC] update test name and config for bugprone-unintended…

bd49d27

…-char-ostream-output (llvm#134868)

Inline: Propagate callsite nofpclass attribute

b0cb672

(llvm#134800) Fixes llvm#134070

[CI] adjust the undef warning regex so it doesn't catch %undef in .ll…

b416e7f

… files

Revert "[AMDGPU] Add buffer.fat.ptr.load.lds intrinsic wrapping raw r…

4a7b34d

…src version (llvm#133015)" (llvm#134871) This reverts commit d1a0572. There was further discussion on the PR about whether the intinsics should exist in this form.

[gn] port 6c74fe9

bb7ff13

Revert "Inline: Propagate callsite nofpclass attribute"

3f38cd0

This reverts commit b0cb672. Breaks bot

Rename F_no_mmap to F_mmap (llvm#134787)

d6c8e89

The `F_no_mmap` flag was introduced by llvm@6814232

Revert "[dsymutil] Avoid copying binary swiftmodules built from textual"

2721d50

This reverts commit 39ace8a. while investigating Linux bot failures.

[MSan] Change overflow_size_tls type to IntPtrTy (llvm#117689)

2713998

As discussed in llvm#109284 (comment): Changed `__msan_va_arg_overflow_size_tls` type from `Int64Ty` to `IntPtrTy`.

[SLP][NFC]Extract TryToFindDuplicates lambda into a separate function…

02a708b

…, NFC Reviewers: RKSimon, hiraditya Reviewed By: hiraditya, RKSimon Pull Request: llvm#134873

[SLP][NFC]Extract a check for strided loads into separate function, NFC

edcbd4a

Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: llvm#134876

Test runs.

05a775a

Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…

db57704

…eic/mclinker

minor update.

4b7caf2

Initialize all.

0633348

Fix format.

9ae2718

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MCLinker] Add MCLinker for parallel llvm compilation linking. #1

[MCLinker] Add MCLinker for parallel llvm compilation linking. #1

Uh oh!

weiweichen commented Mar 25, 2025

Uh oh!

Uh oh!

[MCLinker] Add MCLinker for parallel llvm compilation linking. #1

Are you sure you want to change the base?

[MCLinker] Add MCLinker for parallel llvm compilation linking. #1

Uh oh!

Conversation

weiweichen commented Mar 25, 2025

Uh oh!

Uh oh!