Drawing to render targets is much slower than drawing to default framebuffer due to unnecessary MSAA blits on OpenGL backend

I saw a 10FPS drop in [my game](https://slowrush.dev) when I converted from using the default framebuffer plus screen capture to instead rendering directly to a render target.

After a lot of poking I got a coding agent to minimize the problem into a minimal(ish) reproduction, which let us to conclude that it is related to a bad interaction between breaking draw batches and the MSAA support: the end of every render pass (which happens when a draw batch breaks, such as by drawing a different texture) causes a MSAA blit even for non-MSAA render targets, which is quite slow (at least if using an integrated graphics card and screen-sized render targets like I am).

The fix is complicated because it gets into how macroquad has some different opinions of how render targets and their textures should be cleaned up on the opengl vs the metal backend - there are more details at https://github.com/caspark/macroquad-fbo-test , which also explains & links to the patches I had my coding agent make against miniquad and macroquad to fix the issue.

For reference, the problem is quite severe (at least on my integrated GPU on Linux):

| Sprites[^textures] | Textures[^textures] | Default FB | FBO (upstream) | FBO (patched) | Speedup |
|---------|----------|------------|----------------|---------------|---------|
| 2000    | 1        | 60 FPS     | 60 FPS         | 60 FPS        | 1x      |
| 2000    | 16       | 60 FPS     | **3 FPS**          | 60 FPS        | **20x**     |
| 25000   | 1        | 60 FPS     | 60 FPS         | 60 FPS        | 1x      |
| 25000   | 16       | 38 FPS     | **0.2 FPS**        | 29 FPS[^remains]        | **~145x**  |

[^textures]: sprites = number of instances drawn, textures = number of disparate textures in use. So 2000 sprites 16 textures means "2000 sprites that cycle between 16 textures" (each different texture used causes a new render batch so it's 2000 renderpasses in that case). Obviously you would expect that many unbatched draw calls to be inefficient but you _wouldn't_ expect it be significantly slower when drawing to a render target vs the default framebuffer.

[^remains]: the remaining slowdown in my patched version seems to be due to rebinding the default framebuffer when ending a render pass even if the next draws are in fact to the framebuffer that's already bound. Fixing that is more involved as the screen capture feature seems to rely on the default framebuffer being active at the start)

Anyway given the number of outstanding PRs I don't see much point in polishing these fixes into a full PR, but I figured I'd at least raise the issue so interested parties can grab my fixes. May wholly or partly explain #876 ? Not sure, that's a rather long conversation there so I opted to raise this as a separate issue to avoid conflating that one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drawing to render targets is much slower than drawing to default framebuffer due to unnecessary MSAA blits on OpenGL backend #1020

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sprites¹	Textures¹	Default FB	FBO (upstream)	FBO (patched)	Speedup
2000	1	60 FPS	60 FPS	60 FPS	1x
2000	16	60 FPS	3 FPS	60 FPS	20x
25000	1	60 FPS	60 FPS	60 FPS	1x
25000	16	38 FPS	0.2 FPS	29 FPS²	~145x

Uh oh!

Drawing to render targets is much slower than drawing to default framebuffer due to unnecessary MSAA blits on OpenGL backend #1020

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions