I saw a 10FPS drop in my game when I converted from using the default framebuffer plus screen capture to instead rendering directly to a render target.
After a lot of poking I got a coding agent to minimize the problem into a minimal(ish) reproduction, which let us to conclude that it is related to a bad interaction between breaking draw batches and the MSAA support: the end of every render pass (which happens when a draw batch breaks, such as by drawing a different texture) causes a MSAA blit even for non-MSAA render targets, which is quite slow (at least if using an integrated graphics card and screen-sized render targets like I am).
The fix is complicated because it gets into how macroquad has some different opinions of how render targets and their textures should be cleaned up on the opengl vs the metal backend - there are more details at https://github.com/caspark/macroquad-fbo-test , which also explains & links to the patches I had my coding agent make against miniquad and macroquad to fix the issue.
For reference, the problem is quite severe (at least on my integrated GPU on Linux):
| Sprites1 |
Textures1 |
Default FB |
FBO (upstream) |
FBO (patched) |
Speedup |
| 2000 |
1 |
60 FPS |
60 FPS |
60 FPS |
1x |
| 2000 |
16 |
60 FPS |
3 FPS |
60 FPS |
20x |
| 25000 |
1 |
60 FPS |
60 FPS |
60 FPS |
1x |
| 25000 |
16 |
38 FPS |
0.2 FPS |
29 FPS2 |
~145x |
Anyway given the number of outstanding PRs I don't see much point in polishing these fixes into a full PR, but I figured I'd at least raise the issue so interested parties can grab my fixes. May wholly or partly explain #876 ? Not sure, that's a rather long conversation there so I opted to raise this as a separate issue to avoid conflating that one.
I saw a 10FPS drop in my game when I converted from using the default framebuffer plus screen capture to instead rendering directly to a render target.
After a lot of poking I got a coding agent to minimize the problem into a minimal(ish) reproduction, which let us to conclude that it is related to a bad interaction between breaking draw batches and the MSAA support: the end of every render pass (which happens when a draw batch breaks, such as by drawing a different texture) causes a MSAA blit even for non-MSAA render targets, which is quite slow (at least if using an integrated graphics card and screen-sized render targets like I am).
The fix is complicated because it gets into how macroquad has some different opinions of how render targets and their textures should be cleaned up on the opengl vs the metal backend - there are more details at https://github.com/caspark/macroquad-fbo-test , which also explains & links to the patches I had my coding agent make against miniquad and macroquad to fix the issue.
For reference, the problem is quite severe (at least on my integrated GPU on Linux):
Anyway given the number of outstanding PRs I don't see much point in polishing these fixes into a full PR, but I figured I'd at least raise the issue so interested parties can grab my fixes. May wholly or partly explain #876 ? Not sure, that's a rather long conversation there so I opted to raise this as a separate issue to avoid conflating that one.
Footnotes
sprites = number of instances drawn, textures = number of disparate textures in use. So 2000 sprites 16 textures means "2000 sprites that cycle between 16 textures" (each different texture used causes a new render batch so it's 2000 renderpasses in that case). Obviously you would expect that many unbatched draw calls to be inefficient but you wouldn't expect it be significantly slower when drawing to a render target vs the default framebuffer. ↩ ↩2
the remaining slowdown in my patched version seems to be due to rebinding the default framebuffer when ending a render pass even if the next draws are in fact to the framebuffer that's already bound. Fixing that is more involved as the screen capture feature seems to rely on the default framebuffer being active at the start) ↩