-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce mobile pipeline compilations #102217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, it works as expected.
Device: Samsung Galaxy Tab S9 Ultra (16 GB RAM), Snapdragon 8 Gen 2
Before
Commit: ee4acfb (commit just before this PR)
Godot Engine v4.4.beta.custom_build.ee4acfbfb (2025-01-29 23:45:20 UTC) - https://godotengine.org
Vulkan 1.3.128 - Forward Mobile - Using Device #0: Qualcomm - Adreno (TM) 740
=== Driver Memory Report ===
Launch with --extra-gpu-memory-tracking and build with DEBUG_ENABLED for this functionality to work.
Device memory may be unavailable if the API does not support it(e.g. VK_EXT_device_memory_report is unsupported).
Total Driver Memory:9.94528198242188 MB
Total Driver Num Allocations: 6235
Total Device Memory:243.908191680908 MB
Total Device Num Allocations: 238
Memory use by object type (CSV format):
Category; Driver memory in MB; Driver Allocation Count; Device memory in MB; Device Allocation Count
UNKNOWN;0.0;0;0.0;0
INSTANCE;0.8837194442749;277;0.0;0
PHYSICAL_DEVICE;0.0;0;0.0;0
DEVICE;0.0772705078125;302;0.24674606323242;31
QUEUE;0.0;0;9.29688262939453;5
SEMAPHORE;0.00614166259766;23;0.0;0
COMMAND_BUFFER;0.0;0;1.2265625;37
FENCE;0.00080108642578;3;0.0;0
DEVICE_MEMORY;0.0;0;160.0;4
BUFFER;0.0823974609375;200;0.0;0
IMAGE;0.182373046875;442;0.357421875;1
EVENT;0.0;0;0.0;0
QUERY_POOL;0.00163269042969;6;0.00782775878906;2
BUFFER_VIEW;0.00099182128906;1;0.0;0
IMAGE_VIEW;0.20828247070312;210;0.0;0
SHADER_MODULE;4.87748146057129;564;0.0;0
PIPELINE_CACHE;0.72026348114014;147;0.0;0
PIPELINE_LAYOUT;0.34844970703125;338;0.0;0
RENDER_PASS;0.00786209106445;78;0.0;0
PIPELINE;0.45364761352539;690;72.2435760498047;123
DESCRIPTOR_SET_LAYOUT;0.79085922241211;2362;0.0;0
SAMPLER;0.01373291015625;40;0.0;0
DESCRIPTOR_POOL;0.81290054321289;204;0.4765625;24
DESCRIPTOR_SET;0.0;0;0.0;0
FRAMEBUFFER;0.380615234375;195;0.0526123046875;11
COMMAND_POOL;0.09380722045898;151;0.0;0
DESCRIPTOR_UPDATE_TEMPLATE_KHR;0.0;0;0.0;0
SURFACE_KHR;0.00003051757812;1;0.0;0
SWAPCHAIN_KHR;0.00202178955078;1;0.0;0
DEBUG_UTILS_MESSENGER_EXT;0.0;0;0.0;0
DEBUG_REPORT_CALLBACK_EXT;0.0;0;0.0;0
ACCELERATION_STRUCTURE;0.0;0;0.0;0
VMA_BUFFER_OR_IMAGE;0.0;0;0.0;0
After
Godot Engine v4.4.beta.custom_build.772c18d49 (2025-01-30 23:08:26 UTC) - https://godotengine.org
Vulkan 1.3.128 - Forward Mobile - Using Device #0: Qualcomm - Adreno (TM) 740
MSAA level0
=== Driver Memory Report ===
Launch with --extra-gpu-memory-tracking and build with DEBUG_ENABLED for this functionality to work.
Device memory may be unavailable if the API does not support it(e.g. VK_EXT_device_memory_report is unsupported).
Total Driver Memory:9.69659805297852 MB
Total Driver Num Allocations: 6170
Total Device Memory:195.847888946533 MB
Total Device Num Allocations: 232
Memory use by object type (CSV format):
Category; Driver memory in MB; Driver Allocation Count; Device memory in MB; Device Allocation Count
UNKNOWN;0.0;0;0.0;0
INSTANCE;0.66214656829834;262;0.0;0
PHYSICAL_DEVICE;0.0;0;0.0;0
DEVICE;0.07664489746094;299;0.24674606323242;31
QUEUE;0.0;0;9.29688262939453;5
SEMAPHORE;0.00614166259766;23;0.0;0
COMMAND_BUFFER;0.0;0;1.2265625;37
FENCE;0.00080108642578;3;0.0;0
DEVICE_MEMORY;0.0;0;160.0;4
BUFFER;0.0823974609375;200;0.0;0
IMAGE;0.182373046875;442;0.357421875;1
EVENT;0.0;0;0.0;0
QUERY_POOL;0.00163269042969;6;0.00782775878906;2
BUFFER_VIEW;0.00099182128906;1;0.0;0
IMAGE_VIEW;0.20828247070312;210;0.0;0
SHADER_MODULE;4.87748146057129;564;0.0;0
PIPELINE_CACHE;0.72026348114014;147;0.0;0
PIPELINE_LAYOUT;0.34844970703125;338;0.0;0
RENDER_PASS;0.00609588623047;61;0.0;0
PIPELINE;0.42892837524414;660;24.1832733154297;117
DESCRIPTOR_SET_LAYOUT;0.79085922241211;2362;0.0;0
SAMPLER;0.01373291015625;40;0.0;0
DESCRIPTOR_POOL;0.81290054321289;204;0.4765625;24
DESCRIPTOR_SET;0.0;0;0.0;0
FRAMEBUFFER;0.380615234375;195;0.0526123046875;11
COMMAND_POOL;0.09380722045898;151;0.0;0
DESCRIPTOR_UPDATE_TEMPLATE_KHR;0.0;0;0.0;0
SURFACE_KHR;0.00003051757812;1;0.0;0
SWAPCHAIN_KHR;0.00202178955078;1;0.0;0
DEBUG_UTILS_MESSENGER_EXT;0.0;0;0.0;0
DEBUG_REPORT_CALLBACK_EXT;0.0;0;0.0;0
ACCELERATION_STRUCTURE;0.0;0;0.0;0
VMA_BUFFER_OR_IMAGE;0.0;0;0.0;0
772c18d
to
b16324b
Compare
…obally and only compile what is needed
b16324b
to
7444839
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems good to me already as it's following a similar design philosophy as Forward+ enabling features partially. Also a good idea to merge as it fixes the hash of the pipeline key.
Is there any reason why it's currently marked as a draft that I might not have spotted yet?
It was marked as draft since I wanted Calinou to test on the problematic device and I wanted to try out the other ideas I had (https://github.com/clayjohn/godot/tree/mobile-pipelines-all-settings). Let's go ahead with this for now since we know it helps a lot and is very safe. Then in 4.5 we can experiment with https://github.com/clayjohn/godot/tree/mobile-pipelines-all-settings to get even more reduction in memory use and compile time |
Thanks! |
This is intended to help alleviate #101635 and some of the increase in loading time from the introduction of Ubershaders.
The main changes are:
global_pipeline_data_required
struct with the project settings while initializing the renderer. This ensures that we can use our existing project settings during the mesh compilation phaseFor HDR render targets we save a lot because scenes tend to either have all LDR or all HDR. For the post process we also save a lot if similar environments are used in all Viewports.
At mesh compile time we now assume that subpass post processing isn't being used. I think this is a fair assumption to start from as it means that games that target the lowest end (and thus avoid post effects) won't pay the cost for compile pipeline variants that are only necessary when targeting higher end systems.
Results
I tested with the GDQuest TPS demo using the mobile renderer.
Beta 1

With this PR

Surface compilations are still higher than I would like. But the overall decrease is very significant. So I think this PR is worth going forward with on its own. When using subpass post processing, the surface compilations drop to 22 with this PR. So it gives users an opportunity to save a lot of loading time