Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW16 2025) #18105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2,250 commits into
base: sycl
Choose a base branch
from
Draft

Conversation

iclsrc
Copy link
Contributor

@iclsrc iclsrc commented Apr 21, 2025

gbossu and others added 30 commits April 4, 2025 16:27
This is a mostly straightforward replacement of the previous
`std::pair<int, std::set<std::pair<...>>>` data structure used in
`SLPVectorizerPass::vectorizeStores()` with slightly more readable
alternatives.

I had done that change in my local tree to help me better understand the
code. It’s not very invasive, so I thought I’d create a PR for it.
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
    
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
    
Depends on llvm/llvm-project#132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
A function or subroutine can allow an object of the same name to appear
in its scope, so long as the name is not used. This is similar to the
case of a name being imported from multiple distinct modules, and
implemented by the same representation.

It's not clear whether this is conforming behavior or a common
extension.
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
The RUNTIME_CHECK in question doesn't allow for the possibility that an
allocatable or pointer component could be processed by defined I/O.
Remove it in favor of a dynamic allocation check.
…34270)

The optional second argument to IEEE_SUPPORT_FLAG (and related functions
from the intrinsic IEEE_ARITHMETIC module) is needed only for its type,
not its value. Restrictions on local objects as arguments to function
references in specification expressions shouldn't apply to it.

Define a new attribute for dummy data object characteristics to
distinguish such arguments, set it for the appropriate intrinsic
function references, and test it during specification expression
validation.
…#134149)

When a compiler directive continuation line starts with keyword macro
names that have empty expansions, skip them.
…ers (#134302)

The preprocessor can perform macro replacement within identifiers when
they are split up with Fortran line continuation, but is failing to do
macro replacement on a continued identifier when none of its parts are
replaced.
…921)

There were some remaining headers that were not guarded with
_LIBCPP_HAS_LOCALIZATION, leading to errors when trying to use modules
on platforms that don't support localization (since all the headers get
pulled in when building the 'std' module). This patch brings these
headers in line with what we do for every other header that depends on
localization.

This patch also requires including <picolibc.h> from
<__configuration/platform.h> in order to define _NEWLIB_VERSION. In the
long term, we should use a better approach for doing that, such as
defining a macro in the __config_site header.
…race.

This way we emit the error message that explains the full syntax
for a register list.

parseZcmpStackAdj had to be modified to not assume the previous
operand had been successfully parsed as a register list.
The try-compile mechanism requires that `CMAKE_REQUIRED_FLAGS` is a
space-separated string instead of a list of flags. The original code
expanded `BUILTIN_FLAGS` into `CMAKE_REQUIRED_FLAGS` as a
space-separated string and then would overwrite `CMAKE_REQUIRED_FLAGS`
with `TARGET_${arch}_CFLAGS` prepended to the unexpanded
`BUILTIN_CFLAGS_${arch}`. This resulted in the first two arguments being
passed into the try-compile invocation, but dropping the other arguments
listed in `BUILTIN_CFLAGS_${arch}`.

This patch appends `TARGET_${arch}_CFLAGS` and `BUILTIN_CFLAGS_${arch}` to
`CMAKE_REQUIRED_FLAGS` before expanding CMAKE_REQUIRED_FLAGS as a
space-separated string. This passes any pre-set required flags, in addition to
all of the builtin and target flags to the Float16 detection.
After llvm/llvm-project#133220 we had some empty
complex literals (`tensor<0xcomplex<f32>>`) failing to parse.

This was largely due to the ambiguity between `shape.empty()` meaning
splat (`dense<1>`) or empty literal (`dense<>`). Used type's numel to
disambiguate during verification.
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).

When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded due to being out of
range. A pre-requisite to that is the usage of the `-force-patch`
flag. Alternatrively, BOLT will bail out with a relevant message.

Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when
flushed will update references to point to the optimized functions.
This optimization can be disabled using `--no-scan`.

BOLT can assert if any of these pending relocations cannot be encoded.

This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional and `-force-patch`
was enabled.
Compute the result types and bail out before modifying any IR. That is
more efficient when type conversion failed, because no modifications
must be rolled back.

Note: This is in preparation of the One-Shot Dialect Conversion
refactoring.
…(#129960)

This patch introduces a new option `-preserve-merged-debug-info` to
preserve an arbitrary but deterministic version of debug information
when DILocations are merged. This is intended to be used in production
environments from which sample based profiles are derived such as
AutoFDO and MemProf.

With this patch we have see a 0.2% improvement on an internal workload
at Google when generating AutoFDO profiles. It also significantly
improves the ability for MemProf by preserving debug info for merged
call instructions used in the contextual profile.

---------

Co-authored-by: Krzysztof Pszeniczny <[email protected]>
This makes it more obvious what the R means. I've kept rlist in
place that refer to the encoding.
No longer require -fopenmp or -fopenacc with -E, unless specific version
number options are also required for predefined macros. This means that
most source can be preprocessed with -E and then later compiled with
-fopenmp, -fopenacc, or neither.

This means that OpenMP conditional compilation lines (!$) are also
passed through to -E output. The tricky part of this patch was dealing
with the fact that those conditional lines can also contain regular
Fortran line continuation, and that now has to be deferred when !$ lines
are interspersed.
…#134397)

Currently, these breakpoints are being accumulated every time a new
process if created (e.g. through a `run`). Depending on the
circumstances, the old breakpoints are even left enabled, interfering
with subsequent processes. This is addressed by removing the breakpoints
in ProcessGDBRemote::Clear

Note that these breakpoints are more of a PlatformDarwin thing, so in
the future we should look into moving them there.
This PR makes it so that `CompilerInvocation` is the sole owner of the
`PreprocessorOptions` instance.
This is crucial when recovering from fatal loader errors. Without it,
the `Lexer` keeps yielding more tokens and the compiler may access
invalid `ASTReader` state.

rdar://133388373
...and add missing TargetsToBuild dep.
The signature was changed from void(char *, char *) to void(void *, void
*) to match GCC's signature for the same builtin.

Fixes #47833
Trap checks fail at most once (when the program crashes).
InstCombine will combine this zext of an icmp where the source has a
single bit set to a lshr plus trunc
(`InstCombinerImpl::transformZExtICmp`):

```llvm
define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) {
  %1 = and <vscale x 1 x i64> %x, splat (i64 8)
  %2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0)
  %3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8>
  ret <vscale x 1 x i8> %3
}
```

```llvm
define <vscale x 1 x i8> @reverse_zexticmp_i64(<vscale x 1 x i64> %x) {
  %1 = trunc <vscale x 1 x i64> %x to <vscale x 1 x i8>
  %2 = lshr <vscale x 1 x i8> %1, splat (i8 2)
  %3 = and <vscale x 1 x i8> %2, splat (i8 1)
  ret <vscale x 1 x i8> %3
}
```

In a loop, this ends up being unprofitable for RISC-V because the
codegen now goes from:

```asm
f:                                      # @f
	.cfi_startproc
# %bb.0:
	vsetvli	a0, zero, e64, m1, ta, ma
	vand.vi	v8, v8, 8
	vmsne.vi	v0, v8, 0
	vsetvli	zero, zero, e8, mf8, ta, ma
	vmv.v.i	v8, 0
	vmerge.vim	v8, v8, 1, v0
	ret
```

To a series of narrowing vnsrl.wis:

```asm
f:                                      # @f
	.cfi_startproc
# %bb.0:
	vsetvli	a0, zero, e64, m1, ta, ma
	vand.vi	v8, v8, 8
	vsetvli	zero, zero, e32, mf2, ta, ma
	vnsrl.wi	v8, v8, 3
	vsetvli	zero, zero, e16, mf4, ta, ma
	vnsrl.wi	v8, v8, 0
	vsetvli	zero, zero, e8, mf8, ta, ma
	vnsrl.wi	v8, v8, 0
	ret
```

In the original form, the vmv.v.i is loop invariant and is hoisted out,
and the vmerge.vim usually gets folded away into a masked instruction,
so you usually just end up with a vsetvli + vmsne.vi.

The truncate requires multiple instructions and introduces a vtype
toggle for each one, and is measurably slower on the BPI-F3.

This reverses the transform in RISCVISelLowering for truncations greater
than twice the bitwidth, i.e. it keeps single vnsrl.wis.

Fixes #132245
Add operations for `nvvm.vote.all.sync` and `nvvm.vote.any.sync`
intrinsics similar to `nvvm.vote.ballot.sync`.
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 16:41 — with GitHub Actions Inactive
@jsji jsji force-pushed the llvmspirv_pulldown branch from 2fe74fc to 7e2deec Compare April 28, 2025 17:07
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 17:07 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 17:07 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 17:36 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 18:27 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 18:38 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 18:38 — with GitHub Actions Inactive
@jsji jsji force-pushed the llvmspirv_pulldown branch from 7e2deec to 69fa059 Compare April 28, 2025 19:29
@jsji jsji closed this Apr 28, 2025
@jsji jsji reopened this Apr 28, 2025
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 19:30 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 19:30 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 19:53 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 19:58 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 20:03 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 20:03 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 21:20 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 28, 2025 22:25 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 29, 2025 03:09 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 29, 2025 03:09 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 29, 2025 04:07 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 29, 2025 04:08 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 29, 2025 04:18 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock April 29, 2025 04:18 — with GitHub Actions Inactive
@jsji
Copy link
Contributor

jsji commented Apr 29, 2025

This is ready for review.
arc failures are common to other post-commit. Dev CI failures is also known limitation due to out dated json file.

Others are all cherry-pick from downstream fixes and fixes from code owners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.