Shut down the multiprocess supervisor when a worker dies before startup by Kludex · Pull Request #2987 · Kludex/uvicorn

Kludex · 2026-06-10T11:41:44Z

Summary

First step of the worker lifecycle cleanup discussed in #2980. The supervisor previously could not distinguish a worker that failed to boot from one that died mid-run, so its only policy was "restart on death" - which turns any startup failure into an infinite restart loop. This PR gives the supervisor that distinction without adding any new API or channel:

The worker now builds its own Server in the child: Multiprocess(config, sockets) no longer takes a target. The parent no longer pickles a server.run bound method into children, which is the seam the follow-up PR (Skip importing the app in the parent process when using subprocesses #2988) uses to stop importing the app in the parent entirely (Severe memory regression in the supervisor process when workers > 1 (0.47.0+) #2980).
The existing healthcheck pong now carries Server.started instead of a fixed b"pong". The parent caches it, so the supervisor policy is one rule: a worker that dies before it was ever seen started is a startup failure - terminate everything and exit with STARTUP_FAILURE (moved to uvicorn.server, re-exported from uvicorn.main); a worker that dies after is restarted as before. No second pipe, no callback, no Server API change - the supervisor consumes the started flag the Server already exposes.

This fixes a real restart-loop today: a lifespan startup failure with --workers makes the worker exit cleanly, so the supervisor respawned it forever. Now:

ERROR:    Application startup failed. Exiting.
ERROR:    Child process [84413] died before startup completed, shutting down.
INFO:     Stopping parent process [84411]
(exit code 3)

Parent-side eager app loading is intentionally untouched here, so #2440 fail-fast behavior is unchanged. Removing it (the actual #2980 memory fix) is #2988, stacked on this.

How objects are linked today vs where this is going

Today (main): the parent constructs the Server and imports the app, then ships a pickled server.run bound method into each worker, which imports the app again inside its running event loop. The only worker→supervisor signal is liveness, so every death looks the same.

flowchart TB
    subgraph P["parent process"]
        run["uvicorn.run()"]
        cfg["Config"]
        srv["Server"]
        mp["Multiprocess"]
        run -->|"load_app() - full app import,<br/>this copy is never served (#2980)"| cfg
        run --> srv
        run -->|"target=server.run"| mp
    end
    subgraph W["worker process"]
        tgt["Process.target()"]
        wsrv["server.run()"]
        app["app import - again,<br/>inside the running event loop (#941)"]
        tgt --> wsrv
        wsrv --> app
    end
    srv -.->|"pickled bound method"| tgt
    mp -->|spawn| tgt
    W -.->|"ping/pong: alive or not,<br/>nothing else"| mp
    W -->|"any death → restart<br/>(startup failure = infinite loop)"| mp

Target: the parent holds only Config and sockets; the worker owns its Server and the single app import. The healthcheck reply reports lifecycle (started), not just liveness, so the supervisor can apply a real policy.

flowchart TB
    subgraph P2["parent process"]
        run2["uvicorn.run()"]
        cfg2["Config - just data"]
        mp2["Multiprocess(config, sockets)"]
        run2 --> cfg2
        run2 --> mp2
    end
    subgraph W2["worker process"]
        tgt2["Process.target()"]
        srv2["Server(config) - built in the worker"]
        app2["app import - once, in the worker"]
        tgt2 --> srv2
        srv2 --> app2
    end
    mp2 -->|"spawn: config + sockets"| tgt2
    W2 -.->|"ping → pong carries server.started"| mp2
    mp2 -->|"died before started → shut down, exit 3<br/>died after started → restart"| W2

This PR implements the target diagram except for one edge: the parent's eager load_app() stays for now, so #2440 fail-fast behavior is unchanged while this lands. Removing it (the actual #2980 memory fix) is #2988.

Notes

One edge case of the pull-based signal: a worker that completes startup but dies within one health tick (0.5s), before the parent ever observed started, is classified as a startup failure and shuts the server down. Arguably the right call for a crash-on-start, and far better than the silent infinite restart loop it replaces.
Multiprocess and Process constructor signatures changed (dropped target). They are not documented public API. Server is untouched.

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

github-actions · 2026-06-10T11:42:39Z

📖 Docs preview: https://542b1075-uvicorn.marcelotryle.workers.dev

codspeed-hq · 2026-06-10T11:42:55Z

Merging this PR will not alter performance

✅ 24 untouched benchmarks

_{Comparing worker-readiness (324e5c6) with main (e8a31bc)}

cubic-dev-ai

No issues found across 5 files

_{Re-trigger cubic}

cubic-dev-ai

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="uvicorn/supervisors/multiprocess.py">

<violation number="1" location="uvicorn/supervisors/multiprocess.py:43">
P1: Readiness is only learned from a successful healthcheck round-trip, so a worker that finishes startup and then dies before the first ping is still treated as "not ready" and shuts down the whole supervisor instead of being restarted.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

cubic-dev-ai · 2026-06-12T08:59:30Z

        if self.parent_conn.poll(timeout):
-            self.parent_conn.recv()
+            started: bool = self.parent_conn.recv()
+            self.ready = self.ready or started


P1: Readiness is only learned from a successful healthcheck round-trip, so a worker that finishes startup and then dies before the first ping is still treated as "not ready" and shuts down the whole supervisor instead of being restarted.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At uvicorn/supervisors/multiprocess.py, line 43: <comment>Readiness is only learned from a successful healthcheck round-trip, so a worker that finishes startup and then dies before the first ping is still treated as "not ready" and shuts down the whole supervisor instead of being restarted.</comment> <file context> @@ -30,22 +30,25 @@ def __init__( if self.parent_conn.poll(timeout): - self.parent_conn.recv() + started: bool = self.parent_conn.recv() + self.ready = self.ready or started return True return False </file context>

Shut down the multiprocess supervisor when a worker dies before startup

b7d4597

Kludex deployed to cloudflare June 10, 2026 11:41 — with GitHub Actions View deployment

cubic-dev-ai Bot reviewed Jun 10, 2026

View reviewed changes

This was referenced Jun 11, 2026

support gevent-using apps when workers>2 after #2183 #2853

Open

Skip importing the app in the parent process when using subprocesses #2988

Closed

Pass the started callback to the Server constructor

c788f76

Kludex deployed to cloudflare June 11, 2026 13:25 — with GitHub Actions View deployment

Report worker readiness through the existing healthcheck channel

324e5c6

Kludex deployed to cloudflare June 12, 2026 08:51 — with GitHub Actions View deployment

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

Kludex closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shut down the multiprocess supervisor when a worker dies before startup#2987

Shut down the multiprocess supervisor when a worker dies before startup#2987
Kludex wants to merge 3 commits into
mainfrom
worker-readiness

Kludex commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kludex commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How objects are linked today vs where this is going

Notes

AI Disclaimer

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kludex commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 10, 2026 •

edited

Loading