sled-agent: Move IPCC calls off of tokio worker threads?

In #9720, we encountered an OS bug that caused IPCC `ioctl`s to hang indefinitely. In sled-agent, we call these directly from tokio worker threads, which led to several worker threads getting stuck, and eventually hitting #9619 where the entire runtime blocked (even though some worker threads were still parked / idle). #9619 proposes we implement the general workaround where we periodically spawn a new task into the runtime, which will unstick a runtime stuck because the one thread responsible for I/O is blocked polling the future that caused it to wake up. However, it seems unlikely this would have helped much in the #9720 case - we _probably_ would have only delayed sled-agent hanging, because eventually we would have issued enough IPCC calls to hang all the worker threads.

I'm inclined to say we should treat IPCC calls as "blocking I/O" calls - that seems pretty accurate, since we're doing I/O over a uart to the SP (and/or RoT, depending on the IPCC command) - and put them in `spawn_blocking`. But I'm not sure what that would do in a case like #9720 - if every IPCC call hangs, would we eventually exhaust the `spawn_blocking` pool? Presumably sled-agent would remain generally responsive (except in paths that depended on those IPCC calls?), but what would happen in the limit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sled-agent: Move IPCC calls off of tokio worker threads? #9721

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sled-agent: Move IPCC calls off of tokio worker threads? #9721

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions