Skip to content

feat(proxy): zero-downtime binary upgrades via FD passing #126

@raffaelschneider

Description

@raffaelschneider

Summary

Implement zero-downtime binary upgrades by passing listen socket file descriptors from the old process to the new one using SCM_RIGHTS, so that no connections are dropped during a Zentinel version upgrade.

Motivation

Currently, upgrading Zentinel requires stopping the old binary and starting the new one. Even with a short window, this means dropped connections and potential downtime. For production deployments where Zentinel handles critical traffic, a seamless binary upgrade path is essential.

Prior Art

sozu-proxy has a production-proven implementation of this pattern, used at Clever Cloud:

  1. Serialize all routing state and file descriptors to a temp file
  2. fork() the main process
  3. Child calls exec() with the new binary, passing FD references via command-line args
  4. Listen socket FDs are kept open across exec by disabling CLOEXEC
  5. SCM_RIGHTS via Unix domain sockets passes TCP listener FDs between processes
  6. New process confirms readiness via a channel, old process exits
  7. Existing connections continue on the old workers until they complete naturally

The key insight is that listen sockets are never closed, so the kernel never stops accepting connections on those ports.

Considerations

  • Pingora may already have upgrade mechanisms worth evaluating before building this from scratch
  • State serialization format needs careful versioning (old state must be readable by new binary)
  • Linux-specific (SCM_RIGHTS), macOS supports it too but behavior may differ
  • Graceful drain of old worker connections needs a configurable timeout
  • Integration with systemd socket activation could be an alternative on Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:proxyCore proxy (sentinel-proxy)effort:large3+ days, architectural impactmanifesto:boundedHas clear resource limitstype:featureNew functionality request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions