zentinelproxy · raffaelschneider · Mar 10, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -5,9 +5,9 @@ body:
   - type: markdown
     attributes:
       value: |
-        Before proposing a feature, please read the [Manifesto](https://github.com/zentinelproxy/zentinel/blob/main/MANIFESTO.md).
+        Before proposing a feature, please read the [Manifesto](https://github.com/zentinelproxy/zentinel/blob/main/MANIFESTO.md) and the [design rationale documents](https://github.com/zentinelproxy/zentinel/tree/main/doc/design).
 
-        Zentinel values **predictability over flexibility** and **calm operation over feature breadth**.
+        Zentinel values **predictability over flexibility** and **calm operation over feature breadth**. The design documents explain why key architectural decisions were made and when they might be revisited.
 
   - type: textarea
     id: problem
@@ -72,3 +72,5 @@ body:
           required: true
         - label: I have read the [Manifesto](https://github.com/zentinelproxy/zentinel/blob/main/MANIFESTO.md)
           required: true
+        - label: I have read the [design rationale documents](https://github.com/zentinelproxy/zentinel/tree/main/doc/design) and my proposal does not conflict with existing architectural decisions
+          required: true
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -23,6 +23,7 @@
 ## Checklist
 
 - [ ] I have read [CONTRIBUTING.md](CONTRIBUTING.md)
+- [ ] I have read the [design rationale documents](doc/design/) and my changes align with existing architectural decisions
 - [ ] My code follows the project's coding standards
 - [ ] I have added tests that prove my fix/feature works
 - [ ] All new and existing tests pass locally

diff --git a/doc/design/why-bounded-resources.md b/doc/design/why-bounded-resources.md
@@ -0,0 +1,61 @@
+# Why Bounded Resources
+
+## The Decision
+
+Every resource in Zentinel has an explicit upper bound: connections, request body size, header count, header size, agent concurrency, cache size, decompression ratios, connection pool depth. Nothing grows without limit. Nothing is "unlimited by default."
+
+## Alternatives Considered
+
+**Unbounded by default, limit when needed.** Most proxies start with no limits and let operators add them when problems arise. This is reactive: you discover the limit you needed after the outage. A single client opening 100,000 connections, a request with a 2 GB body, or a zip bomb expanding to fill all available memory—these are not edge cases, they are Tuesday.
+
+**Dynamic auto-scaling.** Automatically grow buffers, pools, and queues based on demand. This works until it doesn't: auto-scaling under a DDoS attack means the proxy consumes all available memory trying to accommodate malicious traffic. The system that was supposed to protect your backend becomes the mechanism of its destruction.
+
+**OS-level limits only.** Rely on `ulimit`, cgroups, and OOM killer for resource boundaries. These are blunt instruments: the OOM killer does not distinguish between a proxy handling legitimate traffic and one being abused. When the OS enforces the limit, recovery is a process restart, not a graceful rejection.
+
+## Why Bounded
+
+**Predictable memory usage.** An operator can look at the configuration and calculate the worst-case memory footprint:
+
+| Resource | Default Limit | Purpose |
+|----------|--------------|---------|
+| Max body size | 1 MB | Prevents memory exhaustion from large uploads |
+| Max header size | 8,192 bytes | Prevents header-based DoS |
+| Max header count | 100 | Prevents header inflation attacks |
+| Max connections per client | 100 | Prevents single-client monopolization |
+| Agent concurrency | 100 per agent | Prevents agent overload |
+| Cache size | 100 MB | Bounded memory for cached responses |
+| Upstream connection pool | 100 per upstream | Prevents upstream connection exhaustion |
+| Decompression ratio | 100x | Zip bomb protection |
+| Decompression output | 10 MB | Absolute decompression ceiling |
+
+These are not hidden safety nets. They are explicit configuration values, logged at startup, observable in metrics.
+
+**Graceful degradation.** When a bound is reached, Zentinel rejects the specific request that would exceed it—with an appropriate HTTP status code and a log entry—rather than degrading the entire system. The 101st connection from a single client gets rejected; the other 100 continue normally. The request with a 2 MB body gets a 413; all other requests are unaffected.
+
+**Noisy neighbor prevention.** Per-agent concurrency semaphores ensure that a slow agent cannot starve other agents. If the WAF agent is processing slowly, it uses its own semaphore budget. The authentication agent continues at full speed with its own independent semaphore. One misbehaving component cannot cascade into system-wide degradation.
+
+**Zip bomb defense.** Decompression is double-bounded: by ratio (output/input must stay below the configured maximum, default 100x) and by absolute size (output must stay below the configured ceiling, default 10 MB). A 1 KB payload that decompresses to 1 GB is caught by the ratio check. A legitimate but large compressed payload is caught by the absolute limit. Both are configurable per deployment.
+
+**Circuit breakers.** Each agent has a three-state circuit breaker (closed → open → half-open) with configurable thresholds. When an agent fails repeatedly, the circuit opens and requests are handled according to the configured failure mode (block or pass-through) without waiting for the agent to time out on every request. Recovery is automatic: after the timeout period, a probe request tests the agent, and on success, the circuit closes.
+
+## Trade-offs
+
+**Operators must size limits.** There is no "unlimited" escape hatch. An operator deploying Zentinel must decide: how large can a request body be? How many connections per client? How much memory for the cache? This requires understanding the workload. We provide documented defaults that work for common cases, but operators should review them.
+
+**Legitimate traffic can be rejected.** A bound that is too tight will reject valid requests. A 1 MB body limit will reject a 2 MB file upload. This is by design: the operator must explicitly raise the limit for endpoints that need it, rather than having no limit and hoping for the best.
+
+**Configuration surface.** Every bound is a configuration knob. More knobs means more to understand, more to review, more to get wrong. We mitigate this with sensible defaults and validation that warns about unusual values, but the complexity is real.
+
+## When to Revisit
+
+- If adaptive limiting (learning from traffic patterns to suggest bounds) proves reliable enough to supplement—not replace—explicit limits
+- If a deployment pattern emerges where the defaults are consistently wrong, we should change the defaults rather than expecting every operator to override them
+- If per-route or per-endpoint limits become necessary (currently most limits are global or per-agent), the configuration model may need to evolve
+
+## Manifesto Alignment
+
+> *"Infrastructure should be calm. [...] It should have clear limits, predictable timeouts, and failure modes you can explain to another human."* — Manifesto, principle 1
+
+> *"A feature that cannot be bounded, observed, tested, and rolled back does not belong in the core."* — Manifesto, principle 6
+
+Bounded resources are how Zentinel ensures that the proxy behaves predictably under any load condition. The operator sets the bounds. The proxy enforces them. The metrics show when they are reached. There are no surprises.
diff --git a/doc/design/why-explicit-config.md b/doc/design/why-explicit-config.md
@@ -0,0 +1,64 @@
+# Why Explicit Configuration
+
+## The Decision
+
+Zentinel requires all operational parameters—limits, timeouts, failure modes, TLS settings—to be explicitly stated in configuration. There are no hidden defaults that silently shape behavior. Every default value is documented, logged on startup, and observable in metrics.
+
+The proxy's failure mode defaults to `closed`: if something is ambiguous or misconfigured, Zentinel rejects rather than guesses.
+
+## Alternatives Considered
+
+**Convention over configuration.** Many frameworks minimize configuration by assuming sensible defaults. Ruby on Rails popularized this: if you follow the convention, things "just work." For a web framework, this reduces boilerplate. For a reverse proxy handling production traffic, invisible conventions become invisible failure modes. An operator debugging a 3 AM outage should not have to know that the default timeout was 30 seconds because the documentation said so three versions ago.
+
+**Auto-detection / smart defaults.** Automatically detect the number of CPU cores, available memory, and network interfaces, then configure accordingly. This sounds helpful but creates non-reproducible behavior: the same configuration file produces different behavior on different machines. When you move from a 4-core dev box to a 64-core production server, the proxy silently changes its concurrency model.
+
+**Fail-open by default.** Many proxies default to permissive behavior: if a WAF agent is unreachable, pass the request through. This prioritizes availability over security. It means that the moment your security infrastructure fails, you have no security—precisely when you need it most.
+
+## Why Explicit
+
+**Debuggability.** When every parameter is stated in configuration, an operator can look at the config file and know exactly what the proxy will do. No need to check documentation for default values, no need to wonder whether a parameter was auto-detected or explicitly set. The configuration file is the source of truth.
+
+**Reproducibility.** The same configuration file produces the same behavior on any machine. If `worker-threads=4` is in the config, there are 4 worker threads—on a laptop and on a 128-core server. The only exception is `worker-threads=0`, which explicitly means "auto-detect," and this choice is logged on startup.
+
+**Fail-closed security.** Zentinel defaults to rejecting ambiguous or broken states:
+
+| Scenario | Default Behavior |
+|----------|-----------------|
+| Agent unreachable | Block request (fail closed) |
+| TLS cert missing | Refuse to start |
+| Unknown config key | Validation error |
+| Cross-reference to nonexistent upstream | Validation error |
+
+An operator can override any of these to fail-open, but they must do so explicitly. The configuration records that decision for auditing.
+
+**Startup validation.** Zentinel validates configuration at startup with four phases:
+
+1. **Parse-time**: Syntax correctness (valid KDL)
+2. **Schema**: Required fields present, types correct
+3. **Semantic**: Cross-references valid (routes reference existing upstreams, filters reference existing agents)
+4. **Runtime**: External resources exist (TLS cert files, agent socket paths)
+
+A misconfigured proxy fails loudly at startup, not silently at 3 AM when a particular code path is first exercised.
+
+**Audit trail.** Explicit configuration means you can diff two config versions and see exactly what changed. No implicit state to track, no auto-detected values that shifted between deployments. Code review of config changes is meaningful because the config contains the full picture.
+
+## Trade-offs
+
+**More configuration to write.** Operators must specify values that other proxies would assume. This is intentional friction: it forces the operator to make conscious decisions about timeouts, limits, and failure modes. But it does increase the initial setup effort.
+
+**Steeper onboarding.** A new user cannot start with an empty config file and have everything work. They must understand what the proxy needs: at minimum, a listener, a route, and an upstream. We mitigate this with example configurations and clear validation error messages that tell you what's missing.
+
+**Verbose for simple cases.** A proxy that serves a single backend on port 80 requires more configuration in Zentinel than in proxies that assume defaults. This is an acceptable cost: simple cases should still be explicit, because simple deployments eventually become complex deployments, and the configuration should grow predictably rather than revealing hidden assumptions.
+
+## When to Revisit
+
+- If the configuration burden becomes a significant barrier to adoption, we could offer a `zentinel init` command that generates an explicit config with documented defaults—but never hide those defaults from the running config
+- If a particular default proves universally correct (never needs changing across deployments), it could be promoted to a documented, logged implicit default—but this bar should be very high
+
+## Manifesto Alignment
+
+> *"Security must be explicit. [...] There is no 'magic'. There is no implied policy. If Zentinel is protecting something, you should be able to point to where and why."* — Manifesto, principle 2
+
+> *"Infrastructure should be calm. [...] It should have clear limits, predictable timeouts, and failure modes you can explain to another human."* — Manifesto, principle 1
+
+Explicit configuration is how Zentinel delivers on both promises: every limit is visible, every failure mode is a conscious choice, and the configuration file tells the full story.
diff --git a/doc/design/why-external-agents.md b/doc/design/why-external-agents.md
@@ -0,0 +1,61 @@
+# Why External Agents
+
+## The Decision
+
+Zentinel processes complex request logic—WAF inspection, authentication, custom business rules—in external agent processes that communicate with the proxy over Unix domain sockets or gRPC. Agents are separate OS processes, not embedded plugins or in-process modules.
+
+## Alternatives Considered
+
+**Embedded plugins (shared libraries / dynamic loading).** Load `.so`/`.dylib` files at runtime. Fast (no IPC), but a bug in any plugin can corrupt proxy memory or crash the entire process. No language flexibility—plugins must be written in Rust or C. Upgrading a plugin requires restarting the proxy.
+
+**WASM filters.** Sandboxed execution within the proxy process. Better isolation than shared libraries, but WASM has limited access to system resources (networking, filesystem), restricted language support (not all languages compile well to WASM), and the sandbox adds overhead for every call. Debugging WASM in production is painful.
+
+**Lua scripting (NGINX/OpenResty model).** Flexible and fast for simple transformations. But Lua's type system is weak, error handling is ad hoc, and complex logic (WAF rule evaluation, ML model inference) does not belong in an embedded scripting language. Lua scripts share the proxy's address space—a runaway script blocks the event loop.
+
+**HTTP callouts (ext_proc / ext_authz).** External services over HTTP. Good isolation, but HTTP adds serialization overhead, connection management complexity, and latency. Every request becomes at least one additional HTTP round-trip. The protocol is generic rather than purpose-built for proxy integration.
+
+## Why External Processes
+
+**Crash isolation.** If a WAF agent segfaults or panics, the proxy keeps serving traffic. The circuit breaker trips, the agent restarts, and recovery is automatic. A bug in request inspection must never take down the proxy.
+
+**Language flexibility.** Agents can be written in any language: Rust, Go, Python, Java. The protocol is documented and SDK libraries are provided. Teams can extend Zentinel without learning Rust or understanding proxy internals.
+
+**Independent deployment.** Agents have their own release cycle. You can upgrade a WAF agent without restarting the proxy. You can roll back an agent without touching the proxy binary. This matters in production where the proxy handles all traffic.
+
+**Resource isolation.** Each agent has its own memory space, CPU allocation, and concurrency limits. A slow authentication agent cannot starve a fast header-transformation agent. Per-agent semaphores enforce concurrency bounds. Circuit breakers prevent cascading failures.
+
+**Noisy neighbor prevention.** Per-agent concurrency semaphores ensure that one slow agent cannot consume all available processing capacity. If Agent A is slow, Agent B continues processing at full speed with its own independent semaphore.
+
+## The Protocol
+
+Agents communicate over a binary protocol with length-prefixed JSON messages:
+
+- **Transport**: Unix domain sockets (primary), gRPC (remote agents), reverse connections (NAT traversal)
+- **Message frame**: 4-byte big-endian length + 1-byte type prefix + JSON payload
+- **Lifecycle events**: `RequestHeaders`, `RequestBody`, `ResponseHeaders`, `ResponseBody`, `RequestComplete`, `WebSocketFrame`, `GuardrailInspect`
+- **Decisions**: `ALLOW` (continue), `BLOCK` (reject with status), `MODIFY` (transform headers/body)
+- **Connection pooling**: Persistent connections with 4 load-balancing strategies (round-robin, least-connections, health-based, random)
+
+The protocol is purpose-built for proxy integration. It exposes exactly the request lifecycle phases that matter, with no unnecessary abstraction.
+
+## Trade-offs
+
+**IPC overhead.** Every agent call crosses a process boundary. For the hot path (every request), this adds latency—typically sub-millisecond over UDS, but nonzero. We mitigate this with connection pooling, persistent connections, and batched communication where possible.
+
+**Operational complexity.** External agents are additional processes to deploy, monitor, and manage. Each agent needs health checking, log collection, and lifecycle management. This is more complex than a single-binary approach.
+
+**Protocol versioning.** The agent protocol is a contract. Breaking changes require coordinated updates across proxy and agents. We version the protocol and maintain backward compatibility where feasible.
+
+## When to Revisit
+
+- If WASM matures to support full system access, rich debugging, and broad language support, some lightweight agents could move in-process
+- If the IPC overhead becomes measurable in latency-critical paths (sub-100μs budgets), a hybrid model with in-process fast-path and external slow-path could be considered
+- If the operational burden of managing agent processes proves too high for small deployments, an embedded mode could be offered as an option
+
+## Manifesto Alignment
+
+> *"Complexity must be isolated. [...] The agent architecture is not a workaround or a plugin system bolted on as an afterthought. It is a fundamental design choice."* — Manifesto, principle 4
+
+> *"A broken extension must never take the whole system down with it. Agents can crash, restart, be upgraded, or be disabled—independently of the proxy."* — Manifesto, principle 4
+
+The external agent model is how Zentinel keeps the core small and the blast radius of complexity contained.