Ubershaders and pipeline pre-compilation (and dedicated transfer queues). #90400

DarioSamo · 2024-04-08T17:40:46Z

This is a big PR with quite a bit of history that should be evaluated very thoroughly to evaluate where do we want to make some concessions and try to mitigate the side effects as much as possible. However, the benefits are essential to shipping games with the engine and making the final experience for users much better. To read further on @reduz's notes about the topic, you can check out these documents (Part 1 and Part 2).

Due to the complexity of this PR and how 4.3 is currently in feature freeze, I'd definitely not consider this PR until 4.3 is out. If you want the TL;DR: skip ahead to the two videos with the TPS demo to see the immediate difference.

NOTE: These improvements will only affect the Forward+ and Mobile renderers. No changes are expected for Compatibility.

Transfer queues

First of all, this PR supersedes the transfer queues PR and effectively uses it as its base. The reliance on needing to unlock parts of the behavior of RenderingDevice to make it multithread-friendly to reap the benefits was far too much to keep both PRs separate. As mentioned in that previous PR, merging it as is will cause a small performance regression unless #86333 is merged first.

Pipeline compilation

Modern APIs like Vulkan and D3D12 have made rendering pipeline management very explicit: their creation is no longer hidden behind the current rendering state and handled on demand by the driver. Instead, the developer must create the entire pipeline ahead of time and wait on a blocking operation that can take a significant amount of time depending on the complexity of the shader and the speed of the hardware. This has seen some improvements recently with the introduction of new extensions like VK_EXT_graphics_pipeline_library, but as always, Godot must engineer solutions aimed towards resolving the problem for as much hardware as possible and use such features optionally for optimization in the future.

Godot has the responsibility to perform as fast as possible for the end user, which leaves it no choice but to generate pipelines with the least amount of code and requirements as possible. The engine achieves this through the use of shader compilation macros (shader variants) and the use of specialization constants to optimize code for a particular pipeline (pipeline variants). While Godot resolves shader variant compilation and can even ship the shader cache to skip the step altogether, it coudn't resolve pipeline variant compilation ahead of time before this PR at all.

If you're familiar with the "stutters when playing the game for the first time" phenomenon that has plagued all games shipped with Godot 4's RD-based renderers, this is pretty much the entire root of the problem. This is not a problem exclusive to Godot as it's been very evident in lots of commercial releases that include very extensive shader pre-compilation steps the first time a game starts or a driver update happens. The issue is so prevalent even Digital Foundry points it out as the #1 problem plaguing PC game releases in this article and they never fail to mention the existence of the problem on any new game that suffers from it.

Ubershaders for Godot 4

The exciting part about this PR is an effective solution was developed to address this problem completely without the need to introduce extensive shader pre-compilation steps or any input from the game developer whatsoever. Instead, attempts have been made to make pipeline compilation a part of loading assets as much as possible. Not only does this mean most pipeline compilation is no longer resolved at drawing time, it can also even be done in background threads and presented as part of a regular loading screen. That means the game is no longer at the mercy of the renderer introducing these stutters when it needs to draw, but it makes the behavior much more predictable and able to be handled as part of a loading process.

The main improvement this PR makes is the introduction of ubershaders once more to the engine, but these are quite different from what was previously done in Godot 3. Unlike the previous version of the engine, these shaders do not correspond to generating text shaders with specializations and compiling them in the background, which could lead to a lot of CPU usage that'd take lots of time in weaker systems. Instead, ubershaders are mostly still very similar to the current shaders the engine already has, with a key difference: specialization constants are pulled from push constants instead. This means that the engine is able to use a version of the shader already that can be used for drawing immediately while the specialized version is generated in the background. Pipeline variants are much faster to generate like this instead of relying on runtime shader compilation to insert the constants as part of the shader text, as they work directly on the SPIR-V and skip the need to compile the shader from text again.

Specialization constants are a big part of how Godot optimizes pipelines, but they've been limited by parts of the design as to how many can actually be used. Any additional constant implied an explosion of variants that led to the pipeline cache structure getting even bigger (160 KB in just pointers in Forward+ for any single material in master at the moment!), and every new addition meant that if the state is very dynamic, stutters would occur due to extra pipeline compilation. This was quite evident in the Mobile renderer, which uses a specialization constant to disable lights if they're not used: as soon as a light popped up, then stutters due to pipeline compilation were inevitable.

With this change, a new simple hashing system for pipeline caching is introduced instead:

The required pipeline is requested in its specialized form from the cache.
If the pipeline is not available yet, compilation is started on the background without stalling the main thread.
The ubershader pipeline (which has been compiled at loading time) is used instead. The specialization constants are pushed as part of the push constant instead along with some other rasterization state parameters (like backface culling).

Pipeline compilation at loading time

The other key part behind the PR is the introduction of pipeline compilation of the ubershaders in two extra steps.

During the creation of the surface cache in the renderer (stutter on scene setup).
During loading of ArrayMeshes (can be pushed to a background thread).

The difference in how both of these changes work together is pretty evident on the TPS demo by simulating a clean run as an end user would see the first time they run the game. A big chunk of the stutters are gone, especially the one that happens the first time the character shoots, which is a typical case of a stutter that only happened at drawing time despite the effect being loaded in the scene tree already.

Both of these videos have pipeline caching disabled and the driver cache deleted between each run.

master (dc91479)

Godot.Third-Person.Shooter.Demo.DEBUG.2024-04-08.13-26-46-00.00.03.950-00.00.25.336.mp4

transfer_and_pipelines

2024-04-08.13-28-29-00.00.07.952-00.00.21.936.mp4

It's also worth noting how the loading screen animation actually plays out more of the time instead of having one big stutter at the end due to the initial pipeline compilation at drawing time. These loading times are also significantly shortened by making multiple improvements to the behavior of both the shader and pipeline compiler, allowing it to multi-thread more effectively and use more of the system's resources.

The negatives (and how we can mitigate them)

As was expected, these benefits do not come for free. But there's multiple ways we can attempt to mitigate most of the extra cost and this is an area I'm open to feedback on and that we can further optimize in future versions as well.

An extra shader variant had to be introduced. @reduz has diligently always recommended against adding shader variants as it leads to a combinatory explosion, but this is one sacrifice that had to be made to allow ubershaders to exist. However, there's potential for reducing some of the existing shader variants to dynamic paths in the ubershader if possible. This is also significantly mitigated by improvements to the shader compiler's multithreading and it should be a non-issue for games that ship the SPIR-V shader cache.
Loading times are bound to be longer as pipeline compilation is pushed here instead. This is the intended effect and it's paying a cost upfront that would happen at drawing time otherwise (which is less preferable). That said, pipeline caching always plays a part here and it'll speed up loading times in later runs as it should.
Higher memory consumption from extra pipelines and shader variants being compiled that might go unused. This is sadly one cost that must be paid no matter what and can hopefully be mitigated by implementing better detection of features in use.

The biggest reason behind these negatives is the engine's flexibility. Features can be turned on and off without explicit operations from the user at a global level: a scene can be instanced to use VoxelGI while another one might use Lightmaps instead. As a matter of fact, this is exactly what the TPS demo does, so any run of the game must pre-compile the Lightmap variants because it can't know ahead of time which method the user has chosen without looking at the scene's contents, which is yet to be instanced during mesh loading.

One of the things I hope to improve while this PR is in progress is reducing the amount of variants that are pre-compiled as much as possible. Therefore it'd be great to gather feedback on which of these methods are most effective and how to implement them:

Detecting features that aren't used at the project level would help significantly. If a user never uses VoxelGI, we shouldn't be pre-compiling variants for it. The biggest culprits here that I could identify are features like separate specular and motion vectors. Adding some form of tracking somewhere at a global level so the engine can know ahead of time without having to instance scenes would be very helpful here.
Assuming features aren't enabled by default and going back to compile them if they are: this is actually something that's partially implemented with the 'advanced' shader groups already. Upgrading this to a per-feature detection could help a lot towards reducing pre-compilation and delegating it to the surface cache setup. If a developer wishes to properly delegate the pre-compilation during mesh loading, all they need to do is just instance scenes first with the appropriate features.
Allowing developers to opt in or out of variants to pre-compile at a global level instead. This is likely a very good solution for more experienced developers to fine-tune their game if it actually has a significant amount of shaders and materials that need it. This would likely be a simple set of toggles indicating which features shouldn't be compiled as the developer knows they'll never make use of them.

It's worth noting that under the current implementation, none of these leading to false positives will lead to the engine misbehaving: at worst, it just causes the drawing time stutters the current version already has.

Testing methodology

As of Add toggle for enabling or disabling RenderingDevice's pipeline cache. #90271 (which is currently merged), you can now disable the pipeline cache at the engine level (if the API supports it). It is essential to turn this setting off to do any comparisons and to simulate the experience of a clean run of a game.

Delete the driver pipeline cache. This is the last barrier of defense the driver has if the application doesn't implement a pipeline cache of its own. This heavily depends on the IHV and the platform (e.g. on Windows & NVIDIA it's located at %LocalAppData%/NVIDIA/GLCache). No test should be considered valid without deleting this cache first and foremost.

Run the game!

Trying to measure the results can be a bit tricky as the results are heavily dependent on the behavior you see in a project. As the benefits are more visually evident as seen in the videos, it is hard to measure the effects of pipeline compilation at drawing time as they present themselves as stutters that happen all throughout the game instead of one particular scenario.

New performance monitors

Some new statistics have been added to the performance monitors which should help verify without a shadow of a doubt if the pipeline pre-compilation is working as intended. There's four different pipeline compilation sources that are identified and they should help towards understanding where a extended loading time or stutter comes from.

Quoted from the documentation added by this PR:

RENDERING_INFO_PIPELINE_COMPILATIONS_MESH: Number of pipeline compilations that were triggered by loading meshes. These compilations will show up as longer loading times the first time a user runs the game and the pipeline is required.
RENDERING_INFO_PIPELINE_COMPILATIONS_SURFACE: Number of pipeline compilations that were triggered by building the surface cache before rendering the scene. These compilations will show up as a stutter when loading scenes the first time a user runs the game and the pipeline is required.
RENDERING_INFO_PIPELINE_COMPILATIONS_DRAW: Number of pipeline compilations that were triggered while drawing the scene. These compilations will show up as stutters during gameplay the first time a user runs the game and the pipeline is required.
RENDERING_INFO_PIPELINE_COMPILATIONS_SPECIALIZATION: Number of pipeline compilations that were triggered to optimize the current scene. These compilations are done in the background and should not cause any stutters whatsoever.

bugsquad edit: Fixes #61233

TODO

Pass CI and address compatibility breakage (if there's any).
Make sure compatibility renderer hasn't had any regressions from modifying the common classes.
Address any multi-threading issues that can possibly arise from both transfer queues and this PR.
Test D3D12 for any regressions or new issues introduced from multithreading.
Evaluate further ways to reduce the amount of pipelines being pre-compiled.
Evaluate any possible CPU-time regressions during drawing and how to mitigate them.
Evaluate adding this improvement to Canvas Renderer as well.

Contributed by W4 Games. 🍀

Calinou · 2024-04-08T21:59:45Z

Allowing developers to opt in or out of variants to pre-compile at a global level instead. This is likely a very good solution for more experienced developers to fine-tune their game if it actually has a significant amount of shaders and materials that need it. This would likely be a simple set of toggles indicating which features shouldn't be compiled as the developer knows they'll never make use of them.

This resembles godotengine/godot-proposals#5229 and godotengine/godot-proposals#6497 a lot, although I haven't proposed it for VoxelGI and LightmapGI yet as these are not Environment or CameraEffects properties.

If such a setting is disabled, we can assume the user is OK with having runtime shader compilation occur the first time the setting is enabled (since they'll probably be in an options menu while doing so).

DarioSamo · 2024-04-11T17:53:39Z

Assuming features aren't enabled by default and going back to compile them if they are: this is actually something that's partially implemented with the 'advanced' shader groups already. Upgrading this to a per-feature detection could help a lot towards reducing pre-compilation and delegating it to the surface cache setup. If a developer wishes to properly delegate the pre-compilation during mesh loading, all they need to do is just instance scenes first with the appropriate features.

I gave this a shot and got pretty successful results. The current caveat is that pipeline compilation will be less likely to be triggered for resources loaded through a background thread in a loading screen unless the game features an scene first with the feature used in-place. If not, then it must defer the loading to the surface cache creation instead.

However the results are pretty good. The pre-compilation on the TPS demo has gone down significantly:

That's around 300 pipelines down from 650+ pipelines in the OP, pretty much doubling the speed of the initial load in the demo that I showcased in the video and still has no pipeline stutters during drawing. I haven't detected any regressions from implementing this yet but trying to find edge cases is still worth investigating.

Godot.Third-Person.Shooter.Demo.DEBUG.2024-04-11.13-19-04-00.00.04.468-00.00.11.535.mp4

I still think we could use some global settings to fine-tune the behavior (e.g. automatically detect, always pre-compile, never pre-compile), but this gets us much closer to an ideal level of pre-compilations that I wanted to see from the start.

DarioSamo · 2024-04-22T17:47:01Z

I investigated Canvas Renderer support and the potential problems we'd have to fix to fully take advantage of it.

First off, Canvas Renderer does suffer from the exact same problem: pipelines are compiled at drawing time if necessary. However, the total amount of pipelines that this does happen on is fairly small. However, it's undeniable you can get stutters from behavior such as enabling and disabling lights in proximity of the elements.

I added the entire framework for supporting ubershaders but ultimately left it disabled for now for a few reasons even if it does work as intended.

The amount of pipelines that you can pre-compile from just the shader data is about six without taking into account the possibility of meshes and polygons. That's a lot of pipelines to pre-compile when the final amount usually ends up being far less. In one example project, the pre-compiled count was basically 24 while the specializations were merely 3. That's a lot of added loading time for very little benefit.
It's impossible for the checks I added to pre-compile pipelines ahead of time for polygons and meshes. The vertex attribute format needs to be known ahead of time with exact precision of the offsets and strides.
A real solution would involve some sort of scheme where we detect commands that get added and cached, pass those off to the renderer and pre-compile pipelines for the cached commands. However upon experimentation it seems whatever point I found that could be used as the hook was called way too often to be beneficial.

For now I'm leaning towards addressing other issues the PR currently has (such as an extra CPU cost due to a mutex I want to avoid), but if anyone has an example of a project that requires lots of different shaders, pipelines, is entirely 2D and suffers from stutters, that'd help to provide a good example of something I can use as a reference.

Calinou

Tested locally with Vulkan Forward+ and Mobile rendering methods, it works as expected. Shader compilation stutter is completely gone in the TPS demo when shooting or destroying an enemy. Runtime performance is identical to master when no shader compilation occurs.

The profilers that track pipeline compilations also work as expected. Docs look good to me as well.

This comes at the cost of slightly longer startup times, but I'd say it's worth it.

Benchmark

PC specifications

CPU: Intel Core i9-13900K
GPU: NVIDIA GeForce RTX 4090
RAM: 64 GB (2×32 GB DDR5-5800 C30)
SSD: Solidigm P44 Pro 2 TB
OS: Linux (Fedora 39)

Using a Linux x86_64 optimized editor build (with LTO).

Startup + shutdown times when running https://github.com/godotengine/tps-demo's main menu:

Cold driver shader cache

$ hyperfine -iw1 -p "rm -rf ~/.cache/nvidia/GLCache" "bin/godot.linuxbsd.editor.x86_64 --path ~/Documents/Godot/tps-demo --quit" "bin/godot.linuxbsd.editor.x86_64.transfer_and_pipelines --path ~/Documents/Godot/tps-demo --quit"
Benchmark 1: bin/godot.linuxbsd.editor.x86_64 --path ~/Documents/Godot/tps-demo --quit
  Time (mean ± σ):      2.412 s ±  0.029 s    [User: 1.057 s, System: 0.294 s]
  Range (min … max):    2.371 s …  2.463 s    10 runs

Benchmark 2: bin/godot.linuxbsd.editor.x86_64.transfer_and_pipelines --path ~/Documents/Godot/tps-demo --quit
  Time (mean ± σ):      2.555 s ±  0.247 s    [User: 1.418 s, System: 0.318 s]
  Range (min … max):    2.079 s …  2.719 s    10 runs

Warm shader driver cache

$ hyperfine -iw1 "bin/godot.linuxbsd.editor.x86_64 --path ~/Documents/Godot/tps-demo --quit" "bin/godot.linuxbsd.editor.x86_64.transfer_and_pipelines --path ~/Documents/Godot/tps-demo --quit"
Benchmark 1: bin/godot.linuxbsd.editor.x86_64 --path ~/Documents/Godot/tps-demo --quit
  Time (mean ± σ):      2.152 s ±  0.028 s    [User: 0.831 s, System: 0.271 s]
  Range (min … max):    2.126 s …  2.204 s    10 runs

Benchmark 2: bin/godot.linuxbsd.editor.x86_64.transfer_and_pipelines --path ~/Documents/Godot/tps-demo --quit
  Time (mean ± σ):      2.236 s ±  0.039 s    [User: 0.917 s, System: 0.294 s]
  Range (min … max):    2.193 s …  2.320 s    10 runs

Summary
  bin/godot.linuxbsd.editor.x86_64 --path ~/Documents/Godot/tps-demo --quit ran
    1.04 ± 0.02 times faster than bin/godot.linuxbsd.editor.x86_64.transfer_and_pipelines --path ~/Documents/Godot/tps-demo --quit

Calinou · 2024-04-30T00:26:54Z

PS: I wonder how this will interact with #88199 – does Metal make this approach possible?

DarioSamo · 2024-04-30T00:44:25Z

PS: I wonder how this will interact with #88199 – does Metal make this approach possible?

Should be completely fine as far as I know as the PR's approach is completely driver-agnostic. A lot of the changes on this one are just basically fixing a lot of stuff that wasn't thread safe, so it could expose some other bugs if part of the Metal driver assumed that wasn't gonna happen (which was a common issue in the D3D12 one but easily fixed).

clayjohn

Looks great now! This is the final culmination of a lot of work spread over many months. I am very glad to see if finished.

This is ready to merge, and I suggest we merge it quickly to avoid conflicts.

I have personally tested on many devices including Win10, Linux, MacOS, and Android. I tested the TPS demo on all platforms, but I also tested the Nuku Warriors demo on Windows and multiple misc. demos on Linux. I am confident at this point that this is good enough for merging.

akien-mga · 2024-10-03T13:35:41Z

Amazing work @DarioSamo 🎉
It's absolutely surreal to see demos like the TPS demo finally run stutter free 🤯

DarioSamo · 2024-10-03T13:56:20Z

Anyone wanting an introduction to this merge can have a look at the tutorial introduced by the PR to the docs here: https://docs.godotengine.org/en/latest/tutorials/performance/pipeline_compilations.html

Tutorial for the new functionality added by godotengine/godot#90400

HeadClot · 2024-10-04T00:38:59Z

Super excited to try this with the XR Editor in Dev 4. Bit of a question however - Does this support the compatibility renderer?

DarioSamo · 2024-10-04T01:52:39Z

Does this support the compatibility renderer?

I'm afraid it's pretty much not possibly by design. Modern APIs like Vulkan are the only ones that provide direct control over creating pipelines, which is what this entire system is designed around.

Capewearer · 2024-10-04T17:55:19Z

This PR could've fixed #95112 , needs further testing.

Tutorial for the new functionality added by godotengine/godot#90400

AThousandShips added enhancement topic:rendering topic:shaders labels Apr 8, 2024

AThousandShips added this to the 4.x milestone Apr 8, 2024

DarioSamo force-pushed the transfer_and_pipelines branch 3 times, most recently from 4615671 to 75603da Compare April 8, 2024 18:35

clayjohn modified the milestones: 4.x, 4.4 Apr 8, 2024

DarioSamo force-pushed the transfer_and_pipelines branch from 75603da to 5e6944a Compare April 8, 2024 19:02

Calinou added the performance label Apr 8, 2024

DarioSamo force-pushed the transfer_and_pipelines branch from 6b01e02 to ed1030b Compare April 11, 2024 17:33

DarioSamo mentioned this pull request Apr 15, 2024

Add transfer queue support to RenderingDevice and enable multithreaded resource loading #87590

Closed

2 tasks

clayjohn mentioned this pull request Apr 16, 2024

Make RID_Owner lock-free for fetching. #86333

Closed

fire mentioned this pull request Apr 16, 2024

Godot 4 3d stutter in exclusive fullscreen mode #90733

Open

DarioSamo force-pushed the transfer_and_pipelines branch from ed1030b to 29e4df1 Compare April 17, 2024 14:06

DarioSamo mentioned this pull request Apr 21, 2024

Add debug utilities for Vulkan #90993

Merged

DarioSamo force-pushed the transfer_and_pipelines branch from 29e4df1 to f767ec8 Compare April 22, 2024 17:34

DarioSamo force-pushed the transfer_and_pipelines branch 3 times, most recently from 50882fa to c21f062 Compare April 23, 2024 17:58

Calinou approved these changes Apr 30, 2024

View reviewed changes

Jamsers mentioned this pull request May 15, 2024

Add shader recycling to have zero cost runtime shader compilation godotengine/godot-proposals#4754

Closed

This was referenced May 17, 2024

Enable geometry fade/transparency in the Mobile renderer #91672

Open

Add "force_pipeline_compilation_on_load" ProjectSetting godotengine/godot-proposals#7357

Closed

clayjohn approved these changes Oct 2, 2024

View reviewed changes

akien-mga merged commit 98deb2a into godotengine:master Oct 3, 2024
19 checks passed

DarioSamo mentioned this pull request Oct 3, 2024

Hide menu BackgroundCache node to fully benefit from async. pipeline compilation godotengine/tps-demo#191

Merged

DarioSamo added a commit to DarioSamo/godot-docs that referenced this pull request Oct 3, 2024

Add tutorial for reducing pipeline compilation stutters.

b3d8e3e

Tutorial for the new functionality added by godotengine/godot#90400

DarioSamo added a commit to DarioSamo/godot-docs that referenced this pull request Oct 3, 2024

Add tutorial for reducing pipeline compilation stutters.

275dc11

Tutorial for the new functionality added by godotengine/godot#90400

Lielay9 mentioned this pull request Oct 3, 2024

2D mesh drawing is broken #97793

Closed

Faless mentioned this pull request Oct 4, 2024

[Linux - NVidia] OS freezes every few seconds for 4-5 seconds since #90400 #97823

Closed

Calinou mentioned this pull request Oct 12, 2024

Add option to pre-compile particle materials at game load or scene load godotengine/godot-proposals#10956

Open

DarkKilauea mentioned this pull request Oct 14, 2024

[DX12] Crash on startup due to missing transfer worker command buffer #98158

Closed

matheusmdx mentioned this pull request Oct 18, 2024

Godot renders a black screen when running a project with vulkan + rendering/driver/threads/thread_model = multi-threaded #98284

Closed

This was referenced Oct 21, 2024

Improve synchronization of rendering after changes from transfer queues. #98388

Merged

Add draw indirect to Rendering Device #97247

Merged

DarioSamo mentioned this pull request Nov 4, 2024

Add multiple specialization constants to Forward+ and Mobile. #98825

Merged

matheusmdx mentioned this pull request Nov 10, 2024

[4.4 dev4] compute_list render thread error #99025

Closed

jonathansekela pushed a commit to jonathansekela/godot-docs that referenced this pull request Nov 19, 2024

Add tutorial for reducing pipeline compilation stutters.

6621699

Tutorial for the new functionality added by godotengine/godot#90400

clayjohn mentioned this pull request Dec 2, 2024

FPS almost halved going from 4.4 dev3 to 4.4 dev 4 (regression from #98652) #99420

Closed

Calinou mentioned this pull request Dec 3, 2024

Stutter when killing enemy for the first time godotengine/tps-demo#192

Open

clayjohn mentioned this pull request Dec 3, 2024

Very long shader/pipeline compilation #99964

Closed

DarioSamo mentioned this pull request Dec 6, 2024

Implement RD::buffer_get_data_async() and RD::texture_get_data_async() #100110

Merged

2 tasks

This was referenced Dec 6, 2024

Shader editor with invalid code prints many error messages, causing the editor to lag. #99263

Closed

Avoid error spam when shaders fail to compile by freeing shader_data version when compilation fails #100128

Merged

clayjohn mentioned this pull request Dec 19, 2024

TODO in RenderingDevice.cpp re: thread safety #100621

Closed

r-eckert mentioned this pull request Jan 16, 2025

Vulkan graphics pipelines use excessive amount of memory on Galaxy S23 #101635

Open

This was referenced Jan 21, 2025

[4.4beta1] Enabling SDFGI -> Desired set (1) not used by shader #101826

Closed

Lag caused by adding mesh to MeshInstance3D (possibly due to the use of threads) in Godot 4.4 dev4,5,6,7 and beta1 #101844

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ubershaders and pipeline pre-compilation (and dedicated transfer queues). #90400

Ubershaders and pipeline pre-compilation (and dedicated transfer queues). #90400

DarioSamo commented Apr 8, 2024 •

edited

Loading

Calinou commented Apr 8, 2024 •

edited

Loading

DarioSamo commented Apr 11, 2024 •

edited

Loading

DarioSamo commented Apr 22, 2024 •

edited

Loading

Calinou left a comment •

edited

Loading

Calinou commented Apr 30, 2024

DarioSamo commented Apr 30, 2024 •

edited

Loading

clayjohn left a comment

akien-mga commented Oct 3, 2024

DarioSamo commented Oct 3, 2024 •

edited

Loading

HeadClot commented Oct 4, 2024

DarioSamo commented Oct 4, 2024 •

edited

Loading

Capewearer commented Oct 4, 2024

Ubershaders and pipeline pre-compilation (and dedicated transfer queues). #90400

Ubershaders and pipeline pre-compilation (and dedicated transfer queues). #90400

Conversation

DarioSamo commented Apr 8, 2024 • edited Loading

Transfer queues

Pipeline compilation

Ubershaders for Godot 4

Pipeline compilation at loading time

The negatives (and how we can mitigate them)

Testing methodology

New performance monitors

TODO

Calinou commented Apr 8, 2024 • edited Loading

DarioSamo commented Apr 11, 2024 • edited Loading

DarioSamo commented Apr 22, 2024 • edited Loading

Calinou left a comment • edited Loading

Choose a reason for hiding this comment

Benchmark

Cold driver shader cache

Warm shader driver cache

Calinou commented Apr 30, 2024

DarioSamo commented Apr 30, 2024 • edited Loading

clayjohn left a comment

Choose a reason for hiding this comment

akien-mga commented Oct 3, 2024

DarioSamo commented Oct 3, 2024 • edited Loading

HeadClot commented Oct 4, 2024

DarioSamo commented Oct 4, 2024 • edited Loading

Capewearer commented Oct 4, 2024

DarioSamo commented Apr 8, 2024 •

edited

Loading

Calinou commented Apr 8, 2024 •

edited

Loading

DarioSamo commented Apr 11, 2024 •

edited

Loading

DarioSamo commented Apr 22, 2024 •

edited

Loading

Calinou left a comment •

edited

Loading

DarioSamo commented Apr 30, 2024 •

edited

Loading

DarioSamo commented Oct 3, 2024 •

edited

Loading

DarioSamo commented Oct 4, 2024 •

edited

Loading