fix: graceful shutdown instead of SIGKILL on worker processes by sansyrox · Pull Request #1349 · sparckles/Robyn

sansyrox · 2026-03-27T00:03:31Z

Summary

Replace process.kill() (SIGKILL) with process.terminate() (SIGTERM) + process.join(timeout=30) in the parent signal handler, with a force-kill fallback for workers that don't exit in time.
In child processes, register a SIGTERM handler via loop.add_signal_handler(signal.SIGTERM, loop.stop) so the event loop exits cleanly, allowing the Rust shutdown handler to fire and in-flight requests to complete.
Move loop.close() into a finally block so it always runs regardless of how the loop exits.

Test plan

Existing integration tests pass
Manual test: start multi-process server, send SIGTERM, verify processes exit with code 0 or -15 (not -9)
Manual test: verify in-flight requests complete before the process exits
Manual test: verify force-kill fallback works if a process hangs beyond 30s

Made with Cursor

Summary by CodeRabbit

Bug Fixes
- Improved shutdown flow: workers now receive a graceful termination window (up to 30s) before forcible stop, reducing abrupt terminations.
- Safer restart behavior: stopped subprocesses are given a bounded grace period (≈5s) before escalation to forceful kill.
- More robust signal handling: termination signals are handled more predictably on supported platforms and interruption paths always ensure event-loop cleanup.

Replace process.kill() (SIGKILL) with process.terminate() (SIGTERM) followed by a 30s join timeout. Workers that don't exit in time are force-killed as a fallback. In child processes, register a SIGTERM handler on the asyncio event loop so that loop.run_forever() exits cleanly, allowing the Rust shutdown handler to fire and in-flight requests to complete. Made-with: Cursor

vercel · 2026-03-27T00:03:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
robyn	Ready	Preview, Comment	Mar 28, 2026 8:05pm

coderabbitai · 2026-03-27T00:05:02Z

📝 Walkthrough

Walkthrough

Refined shutdown flows: parent and reloader now use graceful termination (terminate + bounded wait) with escalation to kill for lingering processes; spawn_process integrates SIGTERM with asyncio loop on non-Windows and ensures the loop is closed on all paths; parent exit via sys.exit(0) after shutdown.

Changes

Cohort / File(s)	Summary
Process Shutdown & asyncio integration `robyn/processpool.py`	Replaced immediate `kill()` with `terminate()` + join(timeout=30) then `kill()` for still-alive workers; signal handler now calls `sys.exit(0)` after shutdown; `spawn_process()` registers `loop.add_signal_handler(signal.SIGTERM, loop.stop)` on non-Windows and always closes the event loop; `KeyboardInterrupt` is swallowed and handled in `finally`.
Reloader subprocess shutdown `robyn/reloader.py`	Changed `EventHandler.stop_server()` to use `process.terminate()` with `wait(timeout=5)`, warn then `kill()` if needed; removed explicit pre-restart `kill()` in `reload()` to rely on updated stop logic before spawning new subprocess.

Sequence Diagram(s)

sequenceDiagram
    participant Signal as Signal Handler
    participant Parent as Parent Process
    participant Workers as Worker Processes
    participant Cleanup as Cleanup Phase

    Signal->>Parent: SIGTERM received
    Parent->>Workers: terminate() [graceful]
    Note over Parent,Workers: Wait up to 30 seconds (join(timeout=30))
    
    alt All workers stopped
        Cleanup->>Parent: All processes stopped
    else Some workers still alive
        Cleanup->>Parent: Log warning about stragglers
        Cleanup->>Workers: kill() [force]
        Note over Cleanup,Workers: Wait additional 5 seconds (join(timeout=5))
    end
    
    Parent->>Parent: sys.exit(0)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

🐰 I nibble logs and watch signals hum,

terminate softly before the drum.
Thirty seconds to tidy the lair,
A stern little hop if stragglers stare,
Goodbye, cleanly — with carrot flair! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: replacing SIGKILL shutdown with graceful SIGTERM-based shutdown for worker processes.
Description check	✅ Passed	The description provides a clear summary of the changes and test plan, though it lacks the GitHub issue reference and some checklist items are incomplete.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/graceful-shutdown

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2026-03-27T00:11:23Z

Merging this PR will not alter performance

✅ 189 untouched benchmarks

_{Comparing fix/graceful-shutdown (834cf7f) with main (3e04c65)}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@robyn/processpool.py`:
- Around line 56-63: The reloader currently bypasses the graceful shutdown logic
by calling prev_process.kill() directly; refactor so the reloader uses the same
terminate→join→kill sequence in robyn/processpool.py instead of force-killing.
Add or expose a helper (e.g., graceful_shutdown or shutdown_process_pool) that
takes a process or process_pool and runs process.terminate(), then joins each
process with timeout=30, logs and calls process.kill() only if still alive and
then joins again with timeout=5; update robyn/reloader.py to call that helper
(replacing prev_process.kill()) so reloads follow the same graceful path.
- Around line 233-240: The worker event loop only registers SIGTERM via
loop.add_signal_handler(signal.SIGTERM, loop.stop), so Ctrl+C (SIGINT) raises
KeyboardInterrupt in children and bypasses coordinated shutdown; update the
worker event loop to also register SIGINT the same way (e.g., add
loop.add_signal_handler(signal.SIGINT, loop.stop) alongside the SIGTERM
registration on non-Windows platforms) or explicitly ignore SIGINT in the worker
before the loop starts; ensure you keep the existing except
KeyboardInterrupt/finally structure but prefer registering SIGINT so
KeyboardInterrupt is not raised in workers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 138fabd6-6b1c-4fb6-828c-29463cd38c1e

📥 Commits

Reviewing files that changed from the base of the PR and between a54ff96 and d53d8b0.

📒 Files selected for processing (1)

robyn/processpool.py

robyn/processpool.py

coderabbitai · 2026-03-27T00:15:28Z

robyn/processpool.py

+        if not sys.platform.startswith("win32"):
+            loop.add_signal_handler(signal.SIGTERM, loop.stop)
+
        loop.run_forever()
    except KeyboardInterrupt:
+        pass
+    finally:
        loop.close()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect whether workers have explicit SIGINT handling or process-group isolation. rg -n -C2 'signal\.SIGINT|signal\.signal\(signal\.SIGINT|add_signal_handler\(signal\.SIGINT|start_new_session|setsid|setpgrp' .

Repository: sparckles/Robyn

Length of output: 1664

🏁 Script executed:

# First, let's see the overall structure of processpool.py head -100 robyn/processpool.py | cat -n

Repository: sparckles/Robyn

Length of output: 3620

🏁 Script executed:

# Continue reading to see spawn_process and worker implementation sed -n '100,250p' robyn/processpool.py | cat -n

Repository: sparckles/Robyn

Length of output: 5761

Workers can still bypass graceful shutdown on Ctrl+C.

Only SIGTERM is trapped in the worker event loop (line 135). The parent installs its SIGINT handler after spawning workers (lines 67–68 occur after line 36–51), so the children inherit default SIGINT behavior. When you press Ctrl+C, the signal reaches workers directly, raising KeyboardInterrupt, which is caught at line 138–139 and does nothing except close the loop. This skips any custom termination logic. The parent's terminating_signal_handler may still execute, but the worker-side flow is not coordinated. Please either ignore SIGINT in the worker event loop or handle it the same way as SIGTERM (calling loop.stop()).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@robyn/processpool.py` around lines 233 - 240, The worker event loop only registers SIGTERM via loop.add_signal_handler(signal.SIGTERM, loop.stop), so Ctrl+C (SIGINT) raises KeyboardInterrupt in children and bypasses coordinated shutdown; update the worker event loop to also register SIGINT the same way (e.g., add loop.add_signal_handler(signal.SIGINT, loop.stop) alongside the SIGTERM registration on non-Windows platforms) or explicitly ignore SIGINT in the worker before the loop starts; ensure you keep the existing except KeyboardInterrupt/finally structure but prefer registering SIGINT so KeyboardInterrupt is not raised in workers.

The reloader was sending SIGTERM then unconditionally calling kill(), bypassing the graceful shutdown the PR adds. Now stop_server() uses terminate(), waits up to 5 seconds, and only force-kills if the process is still alive. The redundant prev_process.kill() in reload() is removed since stop_server() already handles full cleanup. Made-with: Cursor

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@robyn/reloader.py`:
- Around line 126-132: The post-kill wait in the reloader's shutdown path blocks
indefinitely because it calls self.process.wait() with no timeout after
self.process.kill(); change this to use a bounded wait like
self.process.wait(timeout=5) (mirroring the pattern in robyn/processpool.py) and
handle subprocess.TimeoutExpired the same way so the reloader cannot hang if the
child fails to reap; update the block around self.process.terminate(),
self.process.wait(...), subprocess.TimeoutExpired, self.process.kill(), and the
final self.process.wait(...) call to include the timeout and appropriate
exception handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d389933c-81c0-4893-8075-f8f3d065c5f9

📥 Commits

Reviewing files that changed from the base of the PR and between d53d8b0 and 834cf7f.

📒 Files selected for processing (1)

robyn/reloader.py

coderabbitai · 2026-03-28T20:07:09Z

robyn/reloader.py

+            self.process.terminate()
+            try:
+                self.process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                logger.warning("Process %s did not shut down in time, forcing kill.", self.process.pid)
+                self.process.kill()
+                self.process.wait()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read-only verification: inspect kill/wait sequencing in shutdown paths. rg -nP --type=py -C4 '\.terminate\(\)|\.kill\(\)|\.wait\(' robyn/reloader.py robyn/processpool.py

Repository: sparckles/Robyn

Length of output: 2282

Add a timeout to the post-kill wait to prevent indefinite blocking during reload.

Line 132 calls self.process.wait() without a timeout after process.kill(). If the process fails to reap, this blocks the reloader indefinitely, stalling both reload and shutdown operations. This should match the pattern used in robyn/processpool.py:63, which bounds the fallback wait with a timeout.

Proposed fix

except subprocess.TimeoutExpired: logger.warning("Process %s did not shut down in time, forcing kill.", self.process.pid) self.process.kill() - self.process.wait() + try: + self.process.wait(timeout=5) + except subprocess.TimeoutExpired: + logger.error("Process %s did not exit after force kill.", self.process.pid)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@robyn/reloader.py` around lines 126 - 132, The post-kill wait in the reloader's shutdown path blocks indefinitely because it calls self.process.wait() with no timeout after self.process.kill(); change this to use a bounded wait like self.process.wait(timeout=5) (mirroring the pattern in robyn/processpool.py) and handle subprocess.TimeoutExpired the same way so the reloader cannot hang if the child fails to reap; update the block around self.process.terminate(), self.process.wait(...), subprocess.TimeoutExpired, self.process.kill(), and the final self.process.wait(...) call to include the timeout and appropriate exception handling.

vercel bot deployed to Preview March 27, 2026 00:03 View deployment

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

vercel bot deployed to Preview March 28, 2026 20:05 View deployment

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: graceful shutdown instead of SIGKILL on worker processes#1349

fix: graceful shutdown instead of SIGKILL on worker processes#1349
sansyrox wants to merge 2 commits intomainfrom
fix/graceful-shutdown

sansyrox commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

codspeed-hq bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sansyrox commented Mar 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

codspeed-hq bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sansyrox commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Mar 27, 2026 •

edited

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

codspeed-hq bot commented Mar 27, 2026 •

edited

Loading