Skip to content

Conversation

benjaminp
Copy link

@benjaminp benjaminp commented Aug 27, 2025

Prioritize::send_data has a check to prevent assigning capacity to streams that are not yet open. Assigning flow control window to pending streams could starve already open streams.

This change moves the stream.is_pending_open check into Prioritize::try_assign_capacity. This prevents capacity from ever being assigned to pending streams. In particular, neither a client's reserve_capacity call nor a remote's initial window size adjustment will assign capacity to pending streams.

Tests capacity_not_assigned_to_unopened_streams and new_initial_window_size_capacity_not_assigned_to_unopened_streams in flow_control.rs demonstrates the fix.

A number of other tests must be changed because they were assuming that pending streams immediately received connection capacity.

This may be related to #853.

@bvinc
Copy link

bvinc commented Sep 4, 2025

I am hitting this problem in production, and I've spent weeks tracking it down. I am running a tonic GRPC client that is creating thousands of streams to a server that limits streams to 400, and I keep hitting a deadlock. I caught logs where h2 gives capacity to streams that are pending_open, and it triggers a deadlock where no more progress is made. I ended up with a patch to fix it that looks almost exactly like this, which led to me finding this PR. I support this PR.

Here's my question though: Is it always a bug to assign capacity to a stream that is pending_open? If so, do you think there should be a check inside of try_assign_capacity that returns early if the stream is pending_open? If not, maybe there should at lest be a debug_assert?

@seanmonstar
Copy link
Member

Just wanted to chime in here, I think this fix sounds good, but part of me worries that it has deeper repercussions. (Like, does stream.requested_send_capacity being changed without the other parts cause problems?) Much of this code was written years ago, and I don't remember all the nuance.

I hope to find time to thoroughly understand how this affects any other internal invariants... But I appreciate any reports that testing this out with live traffic improved things.

@benjaminp benjaminp force-pushed the no-capacity-for-unopened-streams branch from 98d1c12 to 716dc1e Compare September 4, 2025 21:34
`Prioritize::send_data` has a check to prevent assigning capacity to streams that are not yet open. Assigning flow control window to pending streams could starve already open streams.

This change adds a similar check to `Prioritize::reserve_capacity`.

Test `capacity_not_assigned_to_unopened_streams` in `flow_control.rs` demonstrates the fix.

A number of other tests must be changed because they were assuming that pending streams immediately received connection capacity.

This may be related to hyperium#853.
@benjaminp benjaminp force-pushed the no-capacity-for-unopened-streams branch from 716dc1e to 0980a62 Compare September 4, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants