[SYCL RTC] Introduce `--auto-pch` support #20226

aelovikov-intel · 2025-09-26T20:35:10Z

Compilation of #include <sycl/sycl.hpp> is slow and that's especially problematic for SYCL RTC (run-time compilation). One way to overcome this is fine-grained includes that are being pursued separately. Another way is to employ clang's precompiled headers support which this PR is doing. Those two approaches can be combined, and this PR adds test-e2e/PerformanceTests/KernelCompiler/auto-pch.cpp that gives some idea of the PCH impact. The test shows PCH benefits when compiling some of the fine-grained includes on top of absolute minimum required to compiled SYCL RTC's "Hello world". From one of the CI runs:

Extra Headers	Without PCH	With auto-PCH
	176ms 137ms 136ms 136ms 136ms	226ms 64ms 64ms 64ms 64ms
sycl/half_type.hpp	165ms 165ms 165ms 165ms 165ms	267ms 71ms 72ms 72ms 72ms
sycl/ext/oneapi/bfloat16.hpp	174ms 173ms 173ms 173ms 173ms	279ms 76ms 73ms 73ms 74ms
sycl/marray.hpp	142ms 143ms 142ms 142ms 143ms	235ms 66ms 66ms 66ms 66ms
sycl/vector.hpp	296ms 290ms 290ms 290ms 290ms	487ms 124ms 125ms 125ms 125ms
sycl/multi_ptr.hpp	278ms 278ms 276ms 275ms 274ms	441ms 125ms 125ms 125ms 125ms
sycl/builtins.hpp	537ms 533ms 531ms 531ms 531ms	883ms 218ms 218ms 219ms 218ms

It misses sycl/sycl.hpp line because that currently crashes FE when reading the generated PCH, the crash is being investigated/fixed separately.

Implementation-wise I'm reusing existing upstream clang::PrecompiledPreamble with one minor modification. It seems that PrecompiledPreamble's main usage is for things like clangd so it ignores errors in the code. I've modified it so that those errors would break pch-generation the same way normal compilation would break. I'm also not sure if we'd want that long-term, because it seems that making such "auto-pch" persistent would deviate from the upstream version of PrecompiledPreamble even more. I can imagine that in some near future we'd need to "fork" it into a separate utility. Still, seems to be fine for the first step.

Driver modifications are for the --auto-pch option support that should only be present on the SYCL RTC path and not for the regular clang invocations from the command line. I'm relatively confident those will stay in future.

sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc

premanandrao · 2025-10-08T15:28:31Z

Is it safe to merge this PR with the expectation that the FE will be fixed before the next release? Or, should we hold this PR until there is a fix?

I think this is a reasonable expectation to have. Please don't hold this PR for that fix.

Built on top of `--auto-pch` (in-memory) introduced in intel#20226. The most significant technical decision was how to implement the filesystem cache. I've looked into the following options: * `sycl/source/detail/persistent_device_code_cache.hpp` Also, see `sycl/doc/design/KernelProgramCache.md` Seems to be tailored for the very specific usage scenarios, would be very resource consuming to split into a generic data structure that would then be used for two different use cases. This cache is disabled by default and I'm not sure how well-tested it is. Also, using plain ".lock" files for "advisory locking" instead of the native filesystem mechanisms (e.g., locking APIs in `fcntl`/`flock`/`CreateFile`/`LockFileEx`) made me question if it's worth generalizing and how much work would be necessary there. * `llvm/include/llvm/Support/Caching.hpp` Originally implemented as part of ThinLTO implementation, moved into `LLVMSupport` later with the following commit message: > We would like to move ThinLTO’s battle-tested file caching > mechanism to the LLVM Support library so that we can use it > elsewhere in LLVM. API is rather unexpected, so my research hasn't stopped here. * `lldb/include/lldb/Core/DataFileCache.h` Uses `LLVMSupport`'s caching from the previous bullet under the hood, but provides an easier to grasp API. If we were developing upstream I think uplifting that abstraction into `LLVMSupport` library and then using in both `lldb` and `libsycl` would probably be the choice I'd vote for. However, doing that downstream was too much efforts so I ultimately decided not to go with this approach. That cache also has a `std::mutex` on the "hot" `DataFileCache::GetCachedData` path, I presume to avoid creating the same entry from multiple threads. In the end, I've chosen to use `LLVMSupport`'s quirky (or maybe I just hasn't grown enough to appreciate it) caching API directly and that's what is done in this PR. Unlike 'lldb''s cache I decided to trade possible duplicate work of building the preamble on a cache miss from concurrent threads in favor of no inter-thread synchronization on the cache hit path (not profiled/measured though).

Built on top of `--auto-pch` (in-memory) introduced in intel#20226. The most significant technical decision was how to implement the filesystem cache. I've looked into the following options: * `sycl/source/detail/persistent_device_code_cache.hpp` Also, see `sycl/doc/design/KernelProgramCache.md` Seems to be tailored for the very specific usage scenarios, would be very resource consuming to split into a generic data structure that would then be used for two different use cases. This cache is disabled by default and I'm not sure how well-tested it is. Also, using plain ".lock" files for "advisory locking" instead of the native filesystem mechanisms (e.g., locking APIs in `fcntl`/`flock`/`CreateFile`/`LockFileEx`) made me question if it's worth generalizing and how much work would be necessary there. * `llvm/include/llvm/Support/Caching.hpp` Originally implemented as part of ThinLTO implementation, moved into `LLVMSupport` later with the following commit message: > We would like to move ThinLTO’s battle-tested file caching > mechanism to the LLVM Support library so that we can use it > elsewhere in LLVM. API is rather unexpected, so my research hasn't stopped here. * `lldb/include/lldb/Core/DataFileCache.h` Uses `LLVMSupport`'s caching from the previous bullet under the hood, but provides an easier to grasp API. If we were developing upstream I think uplifting that abstraction into `LLVMSupport` library and then using in both `lldb` and `libsycl` would probably be the choice I'd vote for. However, doing that downstream was too much efforts so I ultimately decided not to go with this approach. That cache also has a `std::mutex` on the "hot" `DataFileCache::GetCachedData` path, I presume to avoid creating the same entry from multiple threads. In the end, I've chosen to use `LLVMSupport`'s quirky (or maybe I just hasn't grown enough to appreciate it) caching API directly and that's what is done in this PR. Unlike `lldb`'s cache, I decided to trade possible duplicate work of building the preamble on a cache miss from concurrent threads in favor of no inter-thread synchronization (not profiled/measured though) on the cache hit path and implementation simplicity.

Built on top of `--auto-pch` (in-memory) introduced in #20226. The most significant technical decision was how to implement the filesystem cache. I've looked into the following options: * `sycl/source/detail/persistent_device_code_cache.hpp` Also, see `sycl/doc/design/KernelProgramCache.md` Seems to be tailored for the very specific usage scenarios, would be very resource consuming to split into a generic data structure that would then be used for two different use cases. This cache is disabled by default and I'm not sure how well-tested it is. Also, using plain ".lock" files for "advisory locking" instead of the native filesystem mechanisms (e.g., locking APIs in `fcntl`/`flock`/`CreateFile`/`LockFileEx`) made me question if it's worth generalizing and how much work would be necessary there. * `llvm/include/llvm/Support/Caching.hpp` Originally implemented as part of ThinLTO implementation, moved into `LLVMSupport` later with the following commit message: > We would like to move ThinLTO’s battle-tested file caching > mechanism to the LLVM Support library so that we can use it > elsewhere in LLVM. API is rather unexpected, so my research hasn't stopped here. * `lldb/include/lldb/Core/DataFileCache.h` Uses `LLVMSupport`'s caching from the previous bullet under the hood, but provides an easier to grasp API. If we were developing upstream I think uplifting that abstraction into `LLVMSupport` library and then using in both `lldb` and `libsycl` would probably be the choice I'd vote for. However, doing that downstream was too much efforts so I ultimately decided not to go with this approach. That cache also has a `std::mutex` on the "hot" `DataFileCache::GetCachedData` path, I presume to avoid creating the same entry from multiple threads. In the end, I've chosen to use `LLVMSupport`'s quirky (or maybe I just hasn't grown enough to appreciate it) caching API directly and that's what is done in this PR. Unlike `lldb`'s cache, I decided to trade possible duplicate work of building the preamble on a cache miss from concurrent threads in favor of no inter-thread synchronization (not profiled/measured though) on the cache hit path and implementation simplicity.

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from 92d3176 to fdabb7f Compare September 26, 2025 20:40

aelovikov-intel temporarily deployed to WindowsCILock September 26, 2025 20:41 — with GitHub Actions Inactive

aelovikov-intel had a problem deploying to WindowsCILock September 26, 2025 21:30 — with GitHub Actions Error

aelovikov-intel temporarily deployed to WindowsCILock September 26, 2025 21:30 — with GitHub Actions Inactive

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from fdabb7f to 91a06f2 Compare September 26, 2025 21:53

aelovikov-intel temporarily deployed to WindowsCILock September 26, 2025 21:53 — with GitHub Actions Inactive

aelovikov-intel temporarily deployed to WindowsCILock September 26, 2025 22:27 — with GitHub Actions Inactive

aelovikov-intel requested a review from gmlueck September 29, 2025 15:16

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from 91a06f2 to d1ff0c3 Compare September 30, 2025 16:02

aelovikov-intel had a problem deploying to WindowsCILock September 30, 2025 16:02 — with GitHub Actions Error

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from d1ff0c3 to 73d087b Compare September 30, 2025 16:16

aelovikov-intel temporarily deployed to WindowsCILock September 30, 2025 16:17 — with GitHub Actions Inactive

aelovikov-intel temporarily deployed to WindowsCILock September 30, 2025 16:59 — with GitHub Actions Inactive

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from 73d087b to 14cc5df Compare September 30, 2025 17:51

aelovikov-intel had a problem deploying to WindowsCILock September 30, 2025 17:51 — with GitHub Actions Failure

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from 14cc5df to aa66cf5 Compare September 30, 2025 18:43

aelovikov-intel had a problem deploying to WindowsCILock September 30, 2025 18:43 — with GitHub Actions Failure

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from aa66cf5 to 193d044 Compare September 30, 2025 19:03

aelovikov-intel had a problem deploying to WindowsCILock September 30, 2025 19:03 — with GitHub Actions Failure

aelovikov-intel temporarily deployed to WindowsCILock September 30, 2025 19:35 — with GitHub Actions Inactive

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from 193d044 to 960f581 Compare September 30, 2025 19:44

aelovikov-intel had a problem deploying to WindowsCILock September 30, 2025 19:45 — with GitHub Actions Error

aelovikov-intel changed the title ~~[DRAFT][SYCL RTC] Implement --auto-pch support~~ [SYCL RTC] Introduce --auto-pch support Sep 30, 2025

[SYCL RTC] Implement --auto-pch support

ae02915

aelovikov-intel force-pushed the sycl-rtc-auto-pch branch from 960f581 to ae02915 Compare September 30, 2025 19:58

aelovikov-intel temporarily deployed to WindowsCILock September 30, 2025 19:58 — with GitHub Actions Inactive

aelovikov-intel temporarily deployed to WindowsCILock September 30, 2025 20:43 — with GitHub Actions Inactive

aelovikov-intel temporarily deployed to WindowsCILock October 2, 2025 19:54 — with GitHub Actions Inactive

add __TIME__ test

869be70

aelovikov-intel had a problem deploying to WindowsCILock October 3, 2025 15:38 — with GitHub Actions Error

Extend errors test

3d45e6a

aelovikov-intel temporarily deployed to WindowsCILock October 3, 2025 15:54 — with GitHub Actions Inactive

aelovikov-intel had a problem deploying to WindowsCILock October 3, 2025 16:29 — with GitHub Actions Failure

Try to make the test work on Win

7317f86

aelovikov-intel temporarily deployed to WindowsCILock October 3, 2025 16:40 — with GitHub Actions Inactive

aelovikov-intel had a problem deploying to WindowsCILock October 3, 2025 17:13 — with GitHub Actions Failure

aelovikov-intel temporarily deployed to WindowsCILock October 3, 2025 17:13 — with GitHub Actions Inactive

aelovikov-intel temporarily deployed to WindowsCILock October 3, 2025 17:54 — with GitHub Actions Inactive

gmlueck reviewed Oct 3, 2025

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc Outdated Show resolved Hide resolved

Apply doc suggestion

ba43750

aelovikov-intel temporarily deployed to WindowsCILock October 3, 2025 18:40 — with GitHub Actions Inactive

aelovikov-intel temporarily deployed to WindowsCILock October 3, 2025 19:12 — with GitHub Actions Inactive

gmlueck approved these changes Oct 8, 2025

View reviewed changes

Merge remote-tracking branch 'origin/sycl' into HEAD

ebb81cb

aelovikov-intel temporarily deployed to WindowsCILock October 8, 2025 15:50 — with GitHub Actions Inactive

hchilama approved these changes Oct 8, 2025

View reviewed changes

aelovikov-intel temporarily deployed to WindowsCILock October 8, 2025 16:47 — with GitHub Actions Inactive

aelovikov-intel merged commit a44f116 into intel:sycl Oct 8, 2025
53 of 54 checks passed

aelovikov-intel deleted the sycl-rtc-auto-pch branch October 8, 2025 17:57

aelovikov-intel mentioned this pull request Oct 15, 2025

[SYCL RTC] Introduce --persistent-auto-pch support #20374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL RTC] Introduce `--auto-pch` support #20226

[SYCL RTC] Introduce `--auto-pch` support #20226

Uh oh!

aelovikov-intel commented Sep 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

premanandrao commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[SYCL RTC] Introduce --auto-pch support #20226

[SYCL RTC] Introduce --auto-pch support #20226

Uh oh!

Conversation

aelovikov-intel commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

premanandrao commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[SYCL RTC] Introduce `--auto-pch` support #20226

[SYCL RTC] Introduce `--auto-pch` support #20226

aelovikov-intel commented Sep 26, 2025 •

edited

Loading