Skip to content

Conversation

@waywardmonkeys
Copy link
Contributor

Connections
Fixes #3084, #8047.

Description

  • Track outstanding Metal command buffers per queue and gate begin_encoding against the hard MAX_COMMAND_BUFFERS, returning device-lost with an actionable warning instead of letting new_command_buffer hang when encoders are leaked.
  • Share the counter across queue/encoders and decrement on submit or discard after clearing the raw command buffer so drop happens before the bookkeeping update.

Testing
Tests pass. I've run the test from #8047 and this code triggers.

Squash or Rebase?
Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

- Track outstanding Metal command buffers per queue and gate begin_encoding against the hard MAX_COMMAND_BUFFERS, returning device-lost with an actionable warning instead of letting new_command_buffer hang when encoders are leaked.
- Share the counter across queue/encoders and decrement on submit or discard after clearing the raw command buffer so drop happens before the bookkeeping update.

Fixes gfx-rs#3084, gfx-rs#8047.
@waywardmonkeys
Copy link
Contributor Author

Using the test from #8047 with the minimal changes to compile against trunk, this code does pick up the issue at a count where it used to hang.

The example for the problem now shows this instead of deadlocking:

[2025-11-26T14:41:43Z WARN  wgpu_hal::metal::command] metal: refusing to create new command buffer; 4097 outstanding command buffers exceeds the limit of 4096. Treating this as device lost. Ensure command encoders are submitted or dropped rather than kept alive to avoid exhausting Metal's command buffer budget.

thread 'main' (58364203) panicked at /Users/bruce/Development/custodian/wgpu/wgpu/src/backend/wgpu_core.rs:2187:18:
Error in Buffer::get_mapped_range: Validation Error

Caused by:
  Buffer with '' label has been destroyed

stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/ed61e7d7e242494fb7057f2657300d9e77bb4fcb/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/ed61e7d7e242494fb7057f2657300d9e77bb4fcb/library/core/src/panicking.rs:75:14
   2: wgpu::backend::wgpu_core::ContextWgpuCore::handle_error_fatal
             at /Users/bruce/Development/custodian/wgpu/wgpu/src/backend/wgpu_core.rs:348:9
   3: <wgpu::backend::wgpu_core::CoreBuffer as wgpu::dispatch::BufferInterface>::get_mapped_range
             at /Users/bruce/Development/custodian/wgpu/wgpu/src/backend/wgpu_core.rs:2187:18
   4: wgpu::api::buffer::BufferSlice::get_mapped_range
             at /Users/bruce/Development/custodian/wgpu/wgpu/src/api/buffer.rs:606:39
   5: TestEncoderTooManyCpasses::main
             at ./src/main.rs:258:29

So you don't get a clear indication of what happened unless you have the logs on...

@andyleiserson andyleiserson self-assigned this Nov 26, 2025
Copy link
Contributor

@andyleiserson andyleiserson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, Firefox has the same concern.

Another test case is cargo xtask cts 'webgpu:api,validation,resource_usages,texture,in_render_common:*'.

// Tracks command buffers created via `CommandEncoder::begin_encoding` that
// have not yet been submitted or discarded. Used to proactively fail
// before hitting Metal's `maxCommandBufferCount`.
command_buffer_created_not_submitted: Arc<atomic::AtomicU64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An observation, but not important: it might be possible to use Arc<()> and rely on Arc::strong_count to indicate the number of outstanding command buffers. Although there is an issue of the difference between when the command encoder is created and when begin_encoding is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hitting the MAX_COMMAND_BUFFERS limit for Metal

2 participants