fix: graceful shutdown instead of SIGKILL on worker processes#1349
fix: graceful shutdown instead of SIGKILL on worker processes#1349
Conversation
Replace process.kill() (SIGKILL) with process.terminate() (SIGTERM) followed by a 30s join timeout. Workers that don't exit in time are force-killed as a fallback. In child processes, register a SIGTERM handler on the asyncio event loop so that loop.run_forever() exits cleanly, allowing the Rust shutdown handler to fire and in-flight requests to complete. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughRefined shutdown flows: parent and reloader now use graceful termination (terminate + bounded wait) with escalation to kill for lingering processes; spawn_process integrates SIGTERM with asyncio loop on non-Windows and ensures the loop is closed on all paths; parent exit via sys.exit(0) after shutdown. Changes
Sequence Diagram(s)sequenceDiagram
participant Signal as Signal Handler
participant Parent as Parent Process
participant Workers as Worker Processes
participant Cleanup as Cleanup Phase
Signal->>Parent: SIGTERM received
Parent->>Workers: terminate() [graceful]
Note over Parent,Workers: Wait up to 30 seconds (join(timeout=30))
alt All workers stopped
Cleanup->>Parent: All processes stopped
else Some workers still alive
Cleanup->>Parent: Log warning about stragglers
Cleanup->>Workers: kill() [force]
Note over Cleanup,Workers: Wait additional 5 seconds (join(timeout=5))
end
Parent->>Parent: sys.exit(0)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@robyn/processpool.py`:
- Around line 56-63: The reloader currently bypasses the graceful shutdown logic
by calling prev_process.kill() directly; refactor so the reloader uses the same
terminate→join→kill sequence in robyn/processpool.py instead of force-killing.
Add or expose a helper (e.g., graceful_shutdown or shutdown_process_pool) that
takes a process or process_pool and runs process.terminate(), then joins each
process with timeout=30, logs and calls process.kill() only if still alive and
then joins again with timeout=5; update robyn/reloader.py to call that helper
(replacing prev_process.kill()) so reloads follow the same graceful path.
- Around line 233-240: The worker event loop only registers SIGTERM via
loop.add_signal_handler(signal.SIGTERM, loop.stop), so Ctrl+C (SIGINT) raises
KeyboardInterrupt in children and bypasses coordinated shutdown; update the
worker event loop to also register SIGINT the same way (e.g., add
loop.add_signal_handler(signal.SIGINT, loop.stop) alongside the SIGTERM
registration on non-Windows platforms) or explicitly ignore SIGINT in the worker
before the loop starts; ensure you keep the existing except
KeyboardInterrupt/finally structure but prefer registering SIGINT so
KeyboardInterrupt is not raised in workers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| if not sys.platform.startswith("win32"): | ||
| loop.add_signal_handler(signal.SIGTERM, loop.stop) | ||
|
|
||
| loop.run_forever() | ||
| except KeyboardInterrupt: | ||
| pass | ||
| finally: | ||
| loop.close() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect whether workers have explicit SIGINT handling or process-group isolation.
rg -n -C2 'signal\.SIGINT|signal\.signal\(signal\.SIGINT|add_signal_handler\(signal\.SIGINT|start_new_session|setsid|setpgrp' .Repository: sparckles/Robyn
Length of output: 1664
🏁 Script executed:
# First, let's see the overall structure of processpool.py
head -100 robyn/processpool.py | cat -nRepository: sparckles/Robyn
Length of output: 3620
🏁 Script executed:
# Continue reading to see spawn_process and worker implementation
sed -n '100,250p' robyn/processpool.py | cat -nRepository: sparckles/Robyn
Length of output: 5761
Workers can still bypass graceful shutdown on Ctrl+C.
Only SIGTERM is trapped in the worker event loop (line 135). The parent installs its SIGINT handler after spawning workers (lines 67–68 occur after line 36–51), so the children inherit default SIGINT behavior. When you press Ctrl+C, the signal reaches workers directly, raising KeyboardInterrupt, which is caught at line 138–139 and does nothing except close the loop. This skips any custom termination logic. The parent's terminating_signal_handler may still execute, but the worker-side flow is not coordinated. Please either ignore SIGINT in the worker event loop or handle it the same way as SIGTERM (calling loop.stop()).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@robyn/processpool.py` around lines 233 - 240, The worker event loop only
registers SIGTERM via loop.add_signal_handler(signal.SIGTERM, loop.stop), so
Ctrl+C (SIGINT) raises KeyboardInterrupt in children and bypasses coordinated
shutdown; update the worker event loop to also register SIGINT the same way
(e.g., add loop.add_signal_handler(signal.SIGINT, loop.stop) alongside the
SIGTERM registration on non-Windows platforms) or explicitly ignore SIGINT in
the worker before the loop starts; ensure you keep the existing except
KeyboardInterrupt/finally structure but prefer registering SIGINT so
KeyboardInterrupt is not raised in workers.
The reloader was sending SIGTERM then unconditionally calling kill(), bypassing the graceful shutdown the PR adds. Now stop_server() uses terminate(), waits up to 5 seconds, and only force-kills if the process is still alive. The redundant prev_process.kill() in reload() is removed since stop_server() already handles full cleanup. Made-with: Cursor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@robyn/reloader.py`:
- Around line 126-132: The post-kill wait in the reloader's shutdown path blocks
indefinitely because it calls self.process.wait() with no timeout after
self.process.kill(); change this to use a bounded wait like
self.process.wait(timeout=5) (mirroring the pattern in robyn/processpool.py) and
handle subprocess.TimeoutExpired the same way so the reloader cannot hang if the
child fails to reap; update the block around self.process.terminate(),
self.process.wait(...), subprocess.TimeoutExpired, self.process.kill(), and the
final self.process.wait(...) call to include the timeout and appropriate
exception handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| self.process.terminate() | ||
| try: | ||
| self.process.wait(timeout=5) | ||
| except subprocess.TimeoutExpired: | ||
| logger.warning("Process %s did not shut down in time, forcing kill.", self.process.pid) | ||
| self.process.kill() | ||
| self.process.wait() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification: inspect kill/wait sequencing in shutdown paths.
rg -nP --type=py -C4 '\.terminate\(\)|\.kill\(\)|\.wait\(' robyn/reloader.py robyn/processpool.pyRepository: sparckles/Robyn
Length of output: 2282
Add a timeout to the post-kill wait to prevent indefinite blocking during reload.
Line 132 calls self.process.wait() without a timeout after process.kill(). If the process fails to reap, this blocks the reloader indefinitely, stalling both reload and shutdown operations. This should match the pattern used in robyn/processpool.py:63, which bounds the fallback wait with a timeout.
Proposed fix
except subprocess.TimeoutExpired:
logger.warning("Process %s did not shut down in time, forcing kill.", self.process.pid)
self.process.kill()
- self.process.wait()
+ try:
+ self.process.wait(timeout=5)
+ except subprocess.TimeoutExpired:
+ logger.error("Process %s did not exit after force kill.", self.process.pid)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@robyn/reloader.py` around lines 126 - 132, The post-kill wait in the
reloader's shutdown path blocks indefinitely because it calls
self.process.wait() with no timeout after self.process.kill(); change this to
use a bounded wait like self.process.wait(timeout=5) (mirroring the pattern in
robyn/processpool.py) and handle subprocess.TimeoutExpired the same way so the
reloader cannot hang if the child fails to reap; update the block around
self.process.terminate(), self.process.wait(...), subprocess.TimeoutExpired,
self.process.kill(), and the final self.process.wait(...) call to include the
timeout and appropriate exception handling.
Summary
process.kill()(SIGKILL) withprocess.terminate()(SIGTERM) +process.join(timeout=30)in the parent signal handler, with a force-kill fallback for workers that don't exit in time.loop.add_signal_handler(signal.SIGTERM, loop.stop)so the event loop exits cleanly, allowing the Rust shutdown handler to fire and in-flight requests to complete.loop.close()into afinallyblock so it always runs regardless of how the loop exits.Test plan
Made with Cursor
Summary by CodeRabbit