Skip to content

Drawing to render targets is much slower than drawing to default framebuffer due to unnecessary MSAA blits on OpenGL backend #1020

@caspark

Description

@caspark

I saw a 10FPS drop in my game when I converted from using the default framebuffer plus screen capture to instead rendering directly to a render target.

After a lot of poking I got a coding agent to minimize the problem into a minimal(ish) reproduction, which let us to conclude that it is related to a bad interaction between breaking draw batches and the MSAA support: the end of every render pass (which happens when a draw batch breaks, such as by drawing a different texture) causes a MSAA blit even for non-MSAA render targets, which is quite slow (at least if using an integrated graphics card and screen-sized render targets like I am).

The fix is complicated because it gets into how macroquad has some different opinions of how render targets and their textures should be cleaned up on the opengl vs the metal backend - there are more details at https://github.com/caspark/macroquad-fbo-test , which also explains & links to the patches I had my coding agent make against miniquad and macroquad to fix the issue.

For reference, the problem is quite severe (at least on my integrated GPU on Linux):

Sprites1 Textures1 Default FB FBO (upstream) FBO (patched) Speedup
2000 1 60 FPS 60 FPS 60 FPS 1x
2000 16 60 FPS 3 FPS 60 FPS 20x
25000 1 60 FPS 60 FPS 60 FPS 1x
25000 16 38 FPS 0.2 FPS 29 FPS2 ~145x

Anyway given the number of outstanding PRs I don't see much point in polishing these fixes into a full PR, but I figured I'd at least raise the issue so interested parties can grab my fixes. May wholly or partly explain #876 ? Not sure, that's a rather long conversation there so I opted to raise this as a separate issue to avoid conflating that one.

Footnotes

  1. sprites = number of instances drawn, textures = number of disparate textures in use. So 2000 sprites 16 textures means "2000 sprites that cycle between 16 textures" (each different texture used causes a new render batch so it's 2000 renderpasses in that case). Obviously you would expect that many unbatched draw calls to be inefficient but you wouldn't expect it be significantly slower when drawing to a render target vs the default framebuffer. 2

  2. the remaining slowdown in my patched version seems to be due to rebinding the default framebuffer when ending a render pass even if the next draws are in fact to the framebuffer that's already bound. Fixing that is more involved as the screen capture feature seems to rely on the default framebuffer being active at the start)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions