Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.4 dev4] compute_list render thread error #99025

Closed
TrueJole opened this issue Nov 10, 2024 · 8 comments · Fixed by #99299
Closed

[4.4 dev4] compute_list render thread error #99025

TrueJole opened this issue Nov 10, 2024 · 8 comments · Fixed by #99299

Comments

@TrueJole
Copy link

Tested versions

  • Reproducible in 4.4 dev 4
  • Not in 4.3

System information

Godot v4.4.dev4 - Linux Mint 22 (Wilma) on X11 - X11 display driver, Single-window, 2 monitors - Vulkan (Forward+) - dedicated AMD Radeon RX 6600 (amdgpu; 6.8.5) - AMD Ryzen 5 3600 6-Core Processor (12 threads)

Issue description

A scene (3D) with enough geometry and GPU Particles (Emitting = true; transform_align = Z-Billboard) I get these errors hundreds of times per second, even in the editor:

  This function (compute_list_begin) can only be called from the render thread. 
  This function (compute_list_bind_compute_pipeline) can only be called from the render thread. 
  This function (compute_list_bind_uniform_set) can only be called from the render thread. 
  This function (compute_list_bind_uniform_set) can only be called from the render thread. 
  This function (compute_list_set_push_constant) can only be called from the render thread. 
  This function (compute_list_dispatch_threads) can only be called from the render thread. 
  This function (compute_list_end) can only be called from the render thread. 

It only happens with enough objects and only if emitting=true AND transform_align=Z-Billboard. This is as far as I could isolate the issue as I don't know much about the engine code.

Steps to reproduce

Open the Test.tscn from the MRP and go to the 3D editor.

Minimal reproduction project (MRP)

MRP.zip

@matheusmdx
Copy link
Contributor

Bisected to #90400, @DarioSamo

@clayjohn
Copy link
Member

This comes from a change to how we handle threading in the RD. The change was introduced in #90400.

The solution is that our functions to update particles based on camera properties (i.e. billboarding) needs to be called from the render thread.

I think it is getting called during culling which can be multi threaded if there are more than 1000 instances in the scene

@DarioSamo
Copy link
Contributor

DarioSamo commented Nov 11, 2024

Bisected to #90400, @DarioSamo

Like Clay said this is not exactly a bug introduced by the PR but rather it is exposing unsafe behavior we had before.

The actionable here would be to:

A) Upgrade RD to support recording compute/draw lists from secondary threads (a pretty major but not impossible refactor).

or

B) Just disable this functionality. The benefits of parallel recording are much further reduced now that the graph is reponsible for the API calls. Most of the API cost is elsewhere right now after reordering.

Although as I understand, B) isn't merely disabling but will have to be some new system to defer the calls outside of the culling thread. That said, it makes some sense as we want to guarantee the proper order of these calls in the first place.

@DarioSamo
Copy link
Contributor

DarioSamo commented Nov 11, 2024

Here's my current understanding of the problem.

The piece of code that is causing trouble is indeed RendererSceneCull when it turns on the multithreaded processing. In particular:

if (RSG::particles_storage->particles_is_inactive(idata.base_rid)) {
				//but if nothing is going on, don't do it.
				keep = false;
} else {
				cull_data.cull->lock.lock();
				RSG::particles_storage->particles_request_process(idata.base_rid);
				cull_data.cull->lock.unlock();
				RSG::particles_storage->particles_set_view_axis(idata.base_rid, -cull_data.cam_transform.basis.get_column(2).normalized(), cull_data.cam_transform.basis.get_column(1).normalized());
				//particles visible? request redraw
				RenderingServerDefault::redraw_request();
}

The else branch here calls particles_set_view_axis, which as far as I can see is the only class that calls this. Of particular interest there's two things here:

  • It already uses the request process mechanism. This is already an ideal solution to delegate the processing to the right step.
  • particles_set_view_axis is normalizing the same camera data for every particle system, which seems excessive as it's always the same transform.
RSG::particles_storage->particles_set_view_axis(idata.base_rid, -cull_data.cam_transform.basis.get_column(2).normalized(), cull_data.cam_transform.basis.get_column(1).normalized());

I think my proposal for solving this would be to expand particles_request_process to include an optional view/axis or to change the method to something similar. It always gets called regardless of the particle system being a billboard or not as well.

@clayjohn
Copy link
Member

@DarioSamo I think we can just use RS::call_on_render_thread()

@Bonkahe
Copy link
Contributor

Bonkahe commented Nov 14, 2024

I have been interacting with the problem without realizing that there is an issue here for it.
My current work around on my build so I can continue working with my project is to just comment out lines 3337 to 3353 and replace them with this:
_scene_cull(cull_data, scene_cull_result, cull_from, cull_to);, this removes all multi-threaded functionality and executes in function in the render thread (the _render_scene function that this is housed in is already in the render thread).

Would this work as a fix for 4.4 to move forward?
Or is pulling the offending code out of the _scene_cull function ( RSG::particles_storage->particles_set_view_axis(idata.base_rid, -cull_data.cam_transform.basis.get_column(2).normalized(), cull_data.cam_transform.basis.get_column(1).normalized()); ) and putting it just below the function even viable/worth the performance gain by having multi-threading functionality in this matter?

I'm aware this is a little ham-handed but I chose this for it's drop in functionality not for performance/elegance, if anyone else has a better solution I'm definently open to suggestions.

@clayjohn
Copy link
Member

I have been interacting with the problem without realizing that there is an issue here for it. My current work around on my build so I can continue working with my project is to just comment out lines 3337 to 3353 and replace them with this: _scene_cull(cull_data, scene_cull_result, cull_from, cull_to);, this removes all multi-threaded functionality and executes in function in the render thread (the _render_scene function that this is housed in is already in the render thread).

Would this work as a fix for 4.4 to move forward? Or is pulling the offending code out of the _scene_cull function ( RSG::particles_storage->particles_set_view_axis(idata.base_rid, -cull_data.cam_transform.basis.get_column(2).normalized(), cull_data.cam_transform.basis.get_column(1).normalized()); ) and putting it just below the function even viable/worth the performance gain by having multi-threading functionality in this matter?

I'm aware this is a little ham-handed but I chose this for it's drop in functionality not for performance/elegance, if anyone else has a better solution I'm definently open to suggestions.

@Bonkahe The solution is way simpler. We just need to wrap the call to particles_set_view_axis() in call_on_render_thread() Its a one line fix and nothing significant needs to change

@Bonkahe
Copy link
Contributor

Bonkahe commented Nov 15, 2024

I have been interacting with the problem without realizing that there is an issue here for it. My current work around on my build so I can continue working with my project is to just comment out lines 3337 to 3353 and replace them with this: _scene_cull(cull_data, scene_cull_result, cull_from, cull_to);, this removes all multi-threaded functionality and executes in function in the render thread (the _render_scene function that this is housed in is already in the render thread).
Would this work as a fix for 4.4 to move forward? Or is pulling the offending code out of the _scene_cull function ( RSG::particles_storage->particles_set_view_axis(idata.base_rid, -cull_data.cam_transform.basis.get_column(2).normalized(), cull_data.cam_transform.basis.get_column(1).normalized()); ) and putting it just below the function even viable/worth the performance gain by having multi-threading functionality in this matter?
I'm aware this is a little ham-handed but I chose this for it's drop in functionality not for performance/elegance, if anyone else has a better solution I'm definently open to suggestions.

@Bonkahe The solution is way simpler. We just need to wrap the call to particles_set_view_axis() in call_on_render_thread() Its a one line fix and nothing significant needs to change

I ran into some slight issues on doing this, and had to take a slightly round-about method to get it resolved, but it's up and running, I threw together a PR for it, over at #99299 , let me know if there's a simpler method to implement this, had some issues calling call_on_render_thread() directly due to particles_storage being a singleton and not having an instance_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment