-
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
area:proxyCore proxy (sentinel-proxy)Core proxy (sentinel-proxy)effort:large3+ days, architectural impact3+ days, architectural impactmanifesto:boundedHas clear resource limitsHas clear resource limitstype:featureNew functionality requestNew functionality request
Description
Summary
Implement zero-downtime binary upgrades by passing listen socket file descriptors from the old process to the new one using SCM_RIGHTS, so that no connections are dropped during a Zentinel version upgrade.
Motivation
Currently, upgrading Zentinel requires stopping the old binary and starting the new one. Even with a short window, this means dropped connections and potential downtime. For production deployments where Zentinel handles critical traffic, a seamless binary upgrade path is essential.
Prior Art
sozu-proxy has a production-proven implementation of this pattern, used at Clever Cloud:
- Serialize all routing state and file descriptors to a temp file
fork()the main process- Child calls
exec()with the new binary, passing FD references via command-line args - Listen socket FDs are kept open across exec by disabling
CLOEXEC SCM_RIGHTSvia Unix domain sockets passes TCP listener FDs between processes- New process confirms readiness via a channel, old process exits
- Existing connections continue on the old workers until they complete naturally
The key insight is that listen sockets are never closed, so the kernel never stops accepting connections on those ports.
Considerations
- Pingora may already have upgrade mechanisms worth evaluating before building this from scratch
- State serialization format needs careful versioning (old state must be readable by new binary)
- Linux-specific (
SCM_RIGHTS), macOS supports it too but behavior may differ - Graceful drain of old worker connections needs a configurable timeout
- Integration with systemd socket activation could be an alternative on Linux
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:proxyCore proxy (sentinel-proxy)Core proxy (sentinel-proxy)effort:large3+ days, architectural impact3+ days, architectural impactmanifesto:boundedHas clear resource limitsHas clear resource limitstype:featureNew functionality requestNew functionality request