Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,105 @@ Get 10% OFF GLM CODING PLAN:https://z.ai/subscribe?ic=8JVLJQFSKB
- OpenAI-compatible upstream providers via config (e.g., OpenRouter)
- Reusable Go SDK for embedding the proxy (see `docs/sdk-usage.md`)

## Operational Enhancements

This fork includes additional "proxy ops" features beyond the mainline release to improve third-party provider integrations:

### Core Features
- Environment-based secret loading via `os.environ/NAME`
- Strict YAML parsing via `strict-config` / `CLIPROXY_STRICT_CONFIG`
- Optional encryption-at-rest for `auth-dir` credentials + atomic/locked writes
- Prometheus metrics endpoint (configurable `/metrics`) + optional auth gate (`metrics.require-auth`)
- In-memory response cache (LRU+TTL) for non-streaming JSON endpoints
- Rate limiting (global / per-key parallelism + per-key RPM + per-key TPM)
- Request/response size limits (`limits.max-*-size-mb`)
- Request body guardrail (reject `api_base` / `base_url` by default)
- Virtual keys (managed client keys) + budgets + pricing-based spend tracking
- Fallback chains (`fallback-chains`) + exponential backoff retries (`retry-policy`)
- Pass-through endpoints (`pass-through.endpoints[]`) for forwarding extra routes upstream
- Health endpoints (`/health/liveness`, `/health/readiness`) + optional background probes
- Sensitive-data masking (request logs + redacted management config view)

### Health-Based Routing & Smart Load Balancing

CLIProxyAPIPlus now includes intelligent routing and health tracking based on production-grade proxy patterns:

#### Features

**Health Tracking System**
- Automatic monitoring of credential health based on failure rates and response latency
- Four health status levels: HEALTHY, DEGRADED, COOLDOWN, ERROR
- Rolling window metrics (configurable 60-second default)
- Per-credential and per-model statistics tracking
- P95/P99 latency percentile calculations
- Automatic cooldown integration

**Advanced Routing Strategies**
- **`fill-first`**: Drain one credential to rate limit/cooldown before moving to the next to stagger rolling windows
- **`round-robin`**: Sequential credential rotation (default)
- **`random`**: Random credential selection
- **`least-busy`**: Select credential with fewest active requests (load balancing)
- **`lowest-latency`**: Select credential with best P95 latency (performance optimization)

**Health-Aware Routing**
- Automatically filter out COOLDOWN and ERROR credentials
- Prefer HEALTHY credentials over DEGRADED when `prefer-healthy: true`
- Graceful fallback to all credentials when no healthy ones available

#### Configuration Example

```yaml
# Health tracking configuration
health-tracking:
enable: true
window-seconds: 60 # Rolling window for failure rate calculation
failure-threshold: 0.5 # 50% failure rate triggers ERROR status
degraded-threshold: 0.1 # 10% failure rate triggers DEGRADED status
min-requests: 5 # Minimum requests before tracking
cleanup-interval: 300 # Cleanup old data every 5 minutes

# Enhanced routing configuration
routing:
strategy: "least-busy" # fill-first, round-robin, random, least-busy, lowest-latency
health-aware: true # Filter unhealthy credentials (COOLDOWN, ERROR)
prefer-healthy: true # Prioritize HEALTHY over DEGRADED credentials
```

#### Routing Strategy Comparison

| Strategy | Best For | How It Works |
|----------|----------|--------------|
| `fill-first` | Staggering rolling caps | Uses the first available credential (by ID) until it hits rate limit/cooldown, then moves to the next |
| `round-robin` | Even distribution, predictable | Cycles through credentials sequentially |
| `random` | Simple load balancing | Randomly selects from available credentials |
| `least-busy` | Optimal load distribution | Selects credential with fewest active requests |
| `lowest-latency` | Performance-critical apps | Selects credential with best P95 latency |

#### Health Status Levels

- **HEALTHY**: Normal operation, low failure rates
- **DEGRADED**: Elevated failure rates (above degraded-threshold but below failure-threshold)
- **COOLDOWN**: Temporarily unavailable due to errors or rate limits
- **ERROR**: High failure rates (above failure-threshold) or persistent errors

#### Benefits

- **Improved reliability** by avoiding unhealthy credentials when `health-aware` routing is enabled
- **Better tail latency** when `lowest-latency` is enabled and health tracking has enough data
- **Smarter load balancing** with `least-busy` using in-flight request counts
- **Automatic recovery** from cooldown windows as health improves

See:
- `docs/operations.md`

### Future work

These are high-value ideas that remain on the roadmap:
- OpenTelemetry tracing + external integrations (Langfuse/Sentry/webhooks)
- Redis-backed distributed cache/rate limits for multi-instance deployments
- DB-backed virtual key store + async spend log writer
- Broader endpoint coverage via native translators (beyond pass-through)

## Getting Started

CLIProxyAPI Guides: [https://help.router-for.me/](https://help.router-for.me/)
Expand Down
Loading