Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce mobile pipeline compilations #102217

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

clayjohn
Copy link
Member

This is intended to help alleviate #101635 and some of the increase in loading time from the introduction of Ubershaders.

The main changes are:

  1. Define key project settings in the RenderingServer so they are available when the Renderer is initialized
  2. Initialize the global_pipeline_data_required struct with the project settings while initializing the renderer. This ensures that we can use our existing project settings during the mesh compilation phase
  3. Only compile VRS pipelines if VRS is in use. This moves the VRS compiles to surface compiles when it is used, but that's a fair tradeoff to cut compiles in half for everyone else
  4. Detect using subpass post process and HDR render targets. Again, this involves moving compiles to the surface, but the overall reduction is so significant that it doesn't matter.

For HDR render targets we save a lot because scenes tend to either have all LDR or all HDR. For the post process we also save a lot if similar environments are used in all Viewports.

At mesh compile time we now assume that subpass post processing isn't being used. I think this is a fair assumption to start from as it means that games that target the lowest end (and thus avoid post effects) won't pay the cost for compile pipeline variants that are only necessary when targeting higher end systems.

Results

I tested with the GDQuest TPS demo using the mobile renderer.

Beta 1
Screenshot from 2025-01-30 14-56-48

With this PR
Screenshot from 2025-01-30 15-07-43

Surface compilations are still higher than I would like. But the overall decrease is very significant. So I think this PR is worth going forward with on its own. When using subpass post processing, the surface compilations drop to 22 with this PR. So it gives users an opportunity to save a lot of loading time

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected.

Device: Samsung Galaxy Tab S9 Ultra (16 GB RAM), Snapdragon 8 Gen 2

Before

Commit: ee4acfb (commit just before this PR)

Godot Engine v4.4.beta.custom_build.ee4acfbfb (2025-01-29 23:45:20 UTC) - https://godotengine.org
Vulkan 1.3.128 - Forward Mobile - Using Device #0: Qualcomm - Adreno (TM) 740

=== Driver Memory Report ===
Launch with --extra-gpu-memory-tracking and build with DEBUG_ENABLED for this functionality to work.
Device memory may be unavailable if the API does not support it(e.g. VK_EXT_device_memory_report is unsupported).

Total Driver Memory:9.94528198242188 MB
Total Driver Num Allocations: 6235
Total Device Memory:243.908191680908 MB
Total Device Num Allocations: 238

Memory use by object type (CSV format):

Category; Driver memory in MB; Driver Allocation Count; Device memory in MB; Device Allocation Count
UNKNOWN;0.0;0;0.0;0
INSTANCE;0.8837194442749;277;0.0;0
PHYSICAL_DEVICE;0.0;0;0.0;0
DEVICE;0.0772705078125;302;0.24674606323242;31
QUEUE;0.0;0;9.29688262939453;5
SEMAPHORE;0.00614166259766;23;0.0;0
COMMAND_BUFFER;0.0;0;1.2265625;37
FENCE;0.00080108642578;3;0.0;0
DEVICE_MEMORY;0.0;0;160.0;4
BUFFER;0.0823974609375;200;0.0;0
IMAGE;0.182373046875;442;0.357421875;1
EVENT;0.0;0;0.0;0
QUERY_POOL;0.00163269042969;6;0.00782775878906;2
BUFFER_VIEW;0.00099182128906;1;0.0;0
IMAGE_VIEW;0.20828247070312;210;0.0;0
SHADER_MODULE;4.87748146057129;564;0.0;0
PIPELINE_CACHE;0.72026348114014;147;0.0;0
PIPELINE_LAYOUT;0.34844970703125;338;0.0;0
RENDER_PASS;0.00786209106445;78;0.0;0
PIPELINE;0.45364761352539;690;72.2435760498047;123
DESCRIPTOR_SET_LAYOUT;0.79085922241211;2362;0.0;0
SAMPLER;0.01373291015625;40;0.0;0
DESCRIPTOR_POOL;0.81290054321289;204;0.4765625;24
DESCRIPTOR_SET;0.0;0;0.0;0
FRAMEBUFFER;0.380615234375;195;0.0526123046875;11
COMMAND_POOL;0.09380722045898;151;0.0;0
DESCRIPTOR_UPDATE_TEMPLATE_KHR;0.0;0;0.0;0
SURFACE_KHR;0.00003051757812;1;0.0;0
SWAPCHAIN_KHR;0.00202178955078;1;0.0;0
DEBUG_UTILS_MESSENGER_EXT;0.0;0;0.0;0
DEBUG_REPORT_CALLBACK_EXT;0.0;0;0.0;0
ACCELERATION_STRUCTURE;0.0;0;0.0;0
VMA_BUFFER_OR_IMAGE;0.0;0;0.0;0

After

Godot Engine v4.4.beta.custom_build.772c18d49 (2025-01-30 23:08:26 UTC) - https://godotengine.org
Vulkan 1.3.128 - Forward Mobile - Using Device #0: Qualcomm - Adreno (TM) 740

MSAA level0
=== Driver Memory Report ===
Launch with --extra-gpu-memory-tracking and build with DEBUG_ENABLED for this functionality to work.
Device memory may be unavailable if the API does not support it(e.g. VK_EXT_device_memory_report is unsupported).

Total Driver Memory:9.69659805297852 MB
Total Driver Num Allocations: 6170
Total Device Memory:195.847888946533 MB
Total Device Num Allocations: 232

Memory use by object type (CSV format):

Category; Driver memory in MB; Driver Allocation Count; Device memory in MB; Device Allocation Count
UNKNOWN;0.0;0;0.0;0
INSTANCE;0.66214656829834;262;0.0;0
PHYSICAL_DEVICE;0.0;0;0.0;0
DEVICE;0.07664489746094;299;0.24674606323242;31
QUEUE;0.0;0;9.29688262939453;5
SEMAPHORE;0.00614166259766;23;0.0;0
COMMAND_BUFFER;0.0;0;1.2265625;37
FENCE;0.00080108642578;3;0.0;0
DEVICE_MEMORY;0.0;0;160.0;4
BUFFER;0.0823974609375;200;0.0;0
IMAGE;0.182373046875;442;0.357421875;1
EVENT;0.0;0;0.0;0
QUERY_POOL;0.00163269042969;6;0.00782775878906;2
BUFFER_VIEW;0.00099182128906;1;0.0;0
IMAGE_VIEW;0.20828247070312;210;0.0;0
SHADER_MODULE;4.87748146057129;564;0.0;0
PIPELINE_CACHE;0.72026348114014;147;0.0;0
PIPELINE_LAYOUT;0.34844970703125;338;0.0;0
RENDER_PASS;0.00609588623047;61;0.0;0
PIPELINE;0.42892837524414;660;24.1832733154297;117
DESCRIPTOR_SET_LAYOUT;0.79085922241211;2362;0.0;0
SAMPLER;0.01373291015625;40;0.0;0
DESCRIPTOR_POOL;0.81290054321289;204;0.4765625;24
DESCRIPTOR_SET;0.0;0;0.0;0
FRAMEBUFFER;0.380615234375;195;0.0526123046875;11
COMMAND_POOL;0.09380722045898;151;0.0;0
DESCRIPTOR_UPDATE_TEMPLATE_KHR;0.0;0;0.0;0
SURFACE_KHR;0.00003051757812;1;0.0;0
SWAPCHAIN_KHR;0.00202178955078;1;0.0;0
DEBUG_UTILS_MESSENGER_EXT;0.0;0;0.0;0
DEBUG_REPORT_CALLBACK_EXT;0.0;0;0.0;0
ACCELERATION_STRUCTURE;0.0;0;0.0;0
VMA_BUFFER_OR_IMAGE;0.0;0;0.0;0

Copy link
Contributor

@DarioSamo DarioSamo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems good to me already as it's following a similar design philosophy as Forward+ enabling features partially. Also a good idea to merge as it fixes the hash of the pipeline key.

Is there any reason why it's currently marked as a draft that I might not have spotted yet?

@clayjohn clayjohn marked this pull request as ready for review February 13, 2025 17:31
@clayjohn clayjohn requested review from a team as code owners February 13, 2025 17:31
@clayjohn
Copy link
Member Author

It was marked as draft since I wanted Calinou to test on the problematic device and I wanted to try out the other ideas I had (https://github.com/clayjohn/godot/tree/mobile-pipelines-all-settings).

Let's go ahead with this for now since we know it helps a lot and is very safe. Then in 4.5 we can experiment with https://github.com/clayjohn/godot/tree/mobile-pipelines-all-settings to get even more reduction in memory use and compile time

@akien-mga akien-mga merged commit 54006f6 into godotengine:master Feb 13, 2025
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@clayjohn clayjohn deleted the mobile-pipelines branch February 15, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants