diff --git a/README.md b/README.md
index 9584fba43..dbd73678d 100644
--- a/README.md
+++ b/README.md
@@ -41,6 +41,105 @@ Get 10% OFF GLM CODING PLAN：https://z.ai/subscribe?ic=8JVLJQFSKB
 - OpenAI-compatible upstream providers via config (e.g., OpenRouter)
 - Reusable Go SDK for embedding the proxy (see `docs/sdk-usage.md`)
 
+## Operational Enhancements
+
+This fork includes additional "proxy ops" features beyond the mainline release to improve third-party provider integrations:
+
+### Core Features
+- Environment-based secret loading via `os.environ/NAME`
+- Strict YAML parsing via `strict-config` / `CLIPROXY_STRICT_CONFIG`
+- Optional encryption-at-rest for `auth-dir` credentials + atomic/locked writes
+- Prometheus metrics endpoint (configurable `/metrics`) + optional auth gate (`metrics.require-auth`)
+- In-memory response cache (LRU+TTL) for non-streaming JSON endpoints
+- Rate limiting (global / per-key parallelism + per-key RPM + per-key TPM)
+- Request/response size limits (`limits.max-*-size-mb`)
+- Request body guardrail (reject `api_base` / `base_url` by default)
+- Virtual keys (managed client keys) + budgets + pricing-based spend tracking
+- Fallback chains (`fallback-chains`) + exponential backoff retries (`retry-policy`)
+- Pass-through endpoints (`pass-through.endpoints[]`) for forwarding extra routes upstream
+- Health endpoints (`/health/liveness`, `/health/readiness`) + optional background probes
+- Sensitive-data masking (request logs + redacted management config view)
+
+### Health-Based Routing & Smart Load Balancing
+
+CLIProxyAPIPlus now includes intelligent routing and health tracking based on production-grade proxy patterns:
+
+#### Features
+
+**Health Tracking System**
+- Automatic monitoring of credential health based on failure rates and response latency
+- Four health status levels: HEALTHY, DEGRADED, COOLDOWN, ERROR
+- Rolling window metrics (configurable 60-second default)
+- Per-credential and per-model statistics tracking
+- P95/P99 latency percentile calculations
+- Automatic cooldown integration
+
+**Advanced Routing Strategies**
+- **`fill-first`**: Drain one credential before moving to the next (default)
+- **`round-robin`**: Sequential credential rotation
+- **`random`**: Random credential selection
+- **`least-busy`**: Select credential with fewest active requests (load balancing)
+- **`lowest-latency`**: Select credential with best P95 latency (performance optimization)
+
+**Health-Aware Routing**
+- Automatically filter out COOLDOWN and ERROR credentials
+- Prefer HEALTHY credentials over DEGRADED when `prefer-healthy: true`
+- Graceful fallback to all credentials when no healthy ones available
+
+#### Configuration Example
+
+```yaml
+# Health tracking configuration
+health-tracking:
+  enable: true
+  window-seconds: 60              # Rolling window for failure rate calculation
+  failure-threshold: 0.5          # 50% failure rate triggers ERROR status
+  degraded-threshold: 0.1         # 10% failure rate triggers DEGRADED status
+  min-requests: 5                 # Minimum requests before tracking
+  cleanup-interval: 300           # Cleanup old data every 5 minutes
+
+# Enhanced routing configuration
+routing:
+  strategy: "least-busy"          # fill-first, round-robin, random, least-busy, lowest-latency
+  health-aware: true              # Filter unhealthy credentials (COOLDOWN, ERROR)
+  prefer-healthy: true            # Prioritize HEALTHY over DEGRADED credentials
+```
+
+#### Routing Strategy Comparison
+
+| Strategy | Best For | How It Works |
+|----------|----------|--------------|
+| `fill-first` | Staggering rolling caps | Uses the first available credential (by ID) until it cools down |
+| `round-robin` | Even distribution, predictable | Cycles through credentials sequentially |
+| `random` | Simple load balancing | Randomly selects from available credentials |
+| `least-busy` | Optimal load distribution | Selects credential with fewest active requests |
+| `lowest-latency` | Performance-critical apps | Selects credential with best P95 latency |
+
+#### Health Status Levels
+
+- **HEALTHY**: Normal operation, low failure rates
+- **DEGRADED**: Elevated failure rates (above degraded-threshold but below failure-threshold)
+- **COOLDOWN**: Temporarily unavailable due to errors or rate limits
+- **ERROR**: High failure rates (above failure-threshold) or persistent errors
+
+#### Benefits
+
+- **Improved reliability** by avoiding unhealthy credentials when `health-aware` routing is enabled
+- **Better tail latency** when `lowest-latency` is enabled and health tracking has enough data
+- **Smarter load balancing** with `least-busy` using in-flight request counts
+- **Automatic recovery** from cooldown windows as health improves
+
+See:
+- `docs/operations.md`
+
+### Future work
+
+These are high-value ideas that remain on the roadmap:
+- OpenTelemetry tracing + external integrations (Langfuse/Sentry/webhooks)
+- Redis-backed distributed cache/rate limits for multi-instance deployments
+- DB-backed virtual key store + async spend log writer
+- Broader endpoint coverage via native translators (beyond pass-through)
+
 ## Getting Started
 
 CLIProxyAPI Guides: [https://help.router-for.me/](https://help.router-for.me/)
diff --git a/config.example.yaml b/config.example.yaml
index 61f51d475..b0e1f2db5 100644
--- a/config.example.yaml
+++ b/config.example.yaml
@@ -1,3 +1,15 @@
+# Server host/interface. Use "127.0.0.1" or "localhost" to restrict access to local machine only.
+host: ""
+
+# Any string value can be sourced from an environment variable by using:
+#   os.environ/ENV_VAR_NAME
+# Example:
+#   remote-management:
+#     secret-key: os.environ/MANAGEMENT_PASSWORD
+
+# Strict YAML parsing (reject unknown fields). Useful to catch typos.
+# strict-config: true
+
 # Server port
 port: 8317
 
@@ -21,9 +33,25 @@ remote-management:
   # Disable the bundled management control panel asset download and HTTP route when true.
   disable-control-panel: false
 
+  # Allow downloading auth JSON files via management endpoints from non-localhost clients.
+  # Disabled by default to reduce the risk of credential exfiltration.
+  allow-auth-file-download: false
+
+  # GitHub repository for the management control panel. Accepts a repository URL or releases API URL.
+  panel-github-repository: "https://github.com/router-for-me/Cli-Proxy-API-Management-Center"
+
 # Authentication directory (supports ~ for home directory)
 auth-dir: "~/.cli-proxy-api"
 
+# Auth file storage settings (credentials saved under auth-dir as *.json)
+auth-storage:
+  # Encrypt auth JSON at rest. If omitted, encryption is auto-enabled when an encryption key is present.
+  # encrypt: true
+  # Encryption key secret. Prefer setting via env (CLIPROXY_AUTH_ENCRYPTION_KEY) and referencing it:
+  # encryption-key: os.environ/CLIPROXY_AUTH_ENCRYPTION_KEY
+  # Allow reading legacy plaintext auth JSON when encryption is enabled (best-effort migrates to encrypted).
+  # allow-plaintext-fallback: true
+
 # API keys for authentication
 api-keys:
   - "your-api-key-1"
@@ -41,12 +69,24 @@ usage-statistics-enabled: false
 # Proxy URL. Supports socks5/http/https protocols. Example: socks5://user:pass@192.168.1.1:1080/
 proxy-url: ""
 
+# Security guardrails. When disabled (default), requests containing api_base/base_url fields are rejected.
+# security:
+#   allow-client-side-credentials: false
+
+# Request/response size limits (max_request_size_mb/max_response_size_mb).
+# limits:
+#   max-request-size-mb: 10
+#   max-response-size-mb: 50
+
 # Number of times to retry a request. Retries will occur if the HTTP response code is 403, 408, 500, 502, 503, or 504.
 request-retry: 3
 
 # Maximum wait time in seconds for a cooled-down credential before triggering a retry.
 max-retry-interval: 30
 
+# When true, disable quota backoff cooldown scheduling for 429 errors (not recommended).
+disable-cooling: false
+
 # Quota exceeded behavior
 quota-exceeded:
   switch-project: true # Whether to automatically switch to another project when a quota is exceeded
@@ -55,6 +95,116 @@ quota-exceeded:
 # When true, enable authentication for the WebSocket API (/v1/ws).
 ws-auth: false
 
+# Response caching configuration
+# cache:
+#   enable: true           # Enable response caching
+#   max-size: 1000         # Maximum number of cached responses
+#   ttl: 300               # Cache TTL in seconds (default: 5 minutes)
+
+# Rate limiting configuration
+# rate-limits:
+#   enable: true                  # Enable rate limiting
+#   max-parallel-requests: 100    # Maximum concurrent requests globally
+#   max-per-key: 10               # Maximum concurrent requests per API key
+#   max-rpm: 60                   # Maximum requests per minute per key
+#   max-tpm: 120000               # Maximum tokens per minute per API key
+
+# Prometheus metrics configuration
+# metrics:
+#   enable: true           # Enable metrics endpoint
+#   endpoint: "/metrics"   # HTTP path for metrics
+#   require-auth: false    # When true, /metrics requires normal API auth
+
+# Credential cooldown configuration
+# cooldown:
+#   enable: true           # Enable automatic cooldown on errors
+#   duration: 60           # Cooldown duration in seconds
+#   trigger-on:            # HTTP status codes that trigger cooldown
+#     - 429
+#     - 500
+#     - 502
+#     - 503
+#     - 504
+
+# Routing / selection strategy when multiple credentials match.
+# routing:
+#   strategy: "fill-first"    # fill-first (default), round-robin, random, least-busy, lowest-latency
+#   health-aware: true        # Filter unhealthy credentials (COOLDOWN, ERROR)
+#   prefer-healthy: true      # Prefer HEALTHY over DEGRADED when health-aware
+#   fill-first-max-inflight-per-auth: 4  # Default: 4 (nil). 0 = unlimited
+#   fill-first-spillover: "next-auth"   # next-auth (default), least-busy
+
+# Health tracking (feeds health-aware routing + readiness checks).
+# health-tracking:
+#   enable: true
+#   window-seconds: 60
+#   failure-threshold: 0.5
+#   degraded-threshold: 0.1
+#   min-requests: 5
+#   cleanup-interval: 300
+
+# Fallback chains (model/provider failover).
+# Fallbacks are attempted on transient failures (network, 408, 429, 5xx).
+# fallback-chains:
+#   enable: true
+#   chains:
+#     - primary-model: "gpt-4o"
+#       primary-provider: "openai"   # optional
+#       fallbacks:
+#         - model: "claude-3-5-sonnet-20241022"
+#           provider: "claude"
+#         - model: "gemini-2.0-flash-exp"
+#           provider: "gemini"
+
+# Retry policy (exponential backoff).
+# Applies to transient failures (network, 408, 5xx). 429 relies on cooldown/Retry-After instead.
+# retry-policy:
+#   enable: true
+#   max-retries: 3
+#   initial-delay-ms: 1000
+#   max-delay-ms: 30000
+#   multiplier: 2.0
+#   jitter: 0.1
+
+# Streaming behavior (SSE keep-alives + safe stream bootstrap retries).
+# streaming:
+#   keepalive-seconds: 15     # Default: 15 (nil). <= 0 disables keep-alives
+#   bootstrap-retries: 2      # Default: 2 (nil). 0 disables bootstrap retries
+
+# Virtual keys (managed client keys).
+# virtual-keys:
+#   enable: true
+#   store-file: ""          # default: <auth-dir>/virtual_keys.json
+#   flush-interval: 5       # seconds
+
+# Pricing table (for spend/budget enforcement on virtual keys).
+# pricing:
+#   enable: true
+#   default:
+#     input-per-1k: 0.0
+#     output-per-1k: 0.0
+#   models:
+#     - match: "gpt-4o*"
+#       input-per-1k: 5.0
+#       output-per-1k: 15.0
+
+# Pass-through endpoints (forward unimplemented routes upstream).
+# pass-through:
+#   enable: true
+#   endpoints:
+#     - path: "/v1/rerank"
+#       method: "POST"
+#       base-url: "https://api.openai.com" # note: do not include /v1 to avoid double /v1/v1
+#       timeout: 60
+#       headers:
+#         Authorization: "Bearer os.environ/OPENAI_API_KEY"
+
+# Health endpoints + optional background probes (lightweight TCP dials).
+# health:
+#   background-checks:
+#     enable: true
+#     interval: 300         # seconds
+
 # Gemini API keys
 # gemini-api-key:
 #   - api-key: "AIzaSy...01"
diff --git a/docs/operations.md b/docs/operations.md
new file mode 100644
index 000000000..981820568
--- /dev/null
+++ b/docs/operations.md
@@ -0,0 +1,377 @@
+# Operations (Security + Observability)
+
+This proxy borrows operational patterns from production-grade systems: environment-based secret loading, safe credential storage, guardrails (rate limits / cooldowns), response caching, and Prometheus metrics.
+
+## Environment-Sourced Secrets (`os.environ/`)
+
+Any string value in `config.yaml` can be set from an environment variable by using the prefix:
+
+```yaml
+some-key: os.environ/MY_ENV_VAR
+```
+
+The config loader resolves these references after YAML unmarshal (works for nested structs, slices, and maps).
+
+If the env var is missing, startup fails (unless running in optional/cloud-deploy mode).
+
+- Keeps secrets out of `config.yaml` by referring to env vars instead of hard-coding secrets.
+- Makes it easier to run the same config across machines/environments.
+
+### Safety note (no “secret persistence”)
+When `os.environ/` references are resolved, config normalization steps that would normally write back to disk are skipped to avoid accidentally writing the resolved secret into `config.yaml`.
+
+## Strict Config Parsing (Reject Unknown YAML Fields)
+
+Strongly typed proxies typically surface unknown fields quickly. In Go/YAML it’s easy to silently ignore typos, so CLIProxyAPI supports strict parsing:
+
+```yaml
+strict-config: true
+```
+
+You can also force strict parsing via env:
+- `CLIPROXY_STRICT_CONFIG=true`
+
+## Encrypted Auth Storage (auth-dir)
+
+Auth JSON files under `auth-dir` can be encrypted-at-rest and are always written using:
+- file locking
+- atomic replace
+- `0600` permissions
+
+Config:
+
+```yaml
+auth-storage:
+  encrypt: true
+  encryption-key: os.environ/CLIPROXY_AUTH_ENCRYPTION_KEY
+  allow-plaintext-fallback: true
+```
+
+Also supported via env: `CLIPROXY_AUTH_ENCRYPTION_KEY` (or legacy `CLI_PROXY_API_AUTH_ENCRYPTION_KEY`).
+
+### What gets encrypted
+- Files under `auth-dir` (typically `*.json`) created by login flows or uploaded via management endpoints.
+- The stored format is an **envelope JSON** (AES-256-GCM). The plaintext JSON is only recovered in-memory.
+
+### Migration behavior
+If encryption is enabled and `allow-plaintext-fallback: true`, legacy plaintext auth files are still readable and will be best-effort rewritten into the encrypted envelope format.
+
+### Remote stores (Postgres/Object store)
+If you mirror auth files to Postgres/S3-backed stores, the raw bytes are stored as-is. When encryption is enabled, those remote payloads remain encrypted envelopes.
+
+## Prometheus Metrics
+
+Enable the metrics endpoint:
+
+```yaml
+metrics:
+  enable: true
+  endpoint: "/metrics"
+  require-auth: false
+```
+
+Metrics include request counts/latency, token totals, cache hits/misses, rate-limit rejections, and cooldown counters.
+
+Key metric names:
+- `cliproxy_requests_total`
+- `cliproxy_request_duration_ms`
+- `cliproxy_tokens_input_total` / `cliproxy_tokens_output_total`
+- `cliproxy_cache_hits_total` / `cliproxy_cache_misses_total`
+- `cliproxy_ratelimit_rejections_total`
+- `cliproxy_cooldowns_triggered_total`
+
+## Response Cache
+
+Enable in-memory response caching:
+
+```yaml
+cache:
+  enable: true
+  max-size: 1000
+  ttl: 300
+```
+
+### What is cached
+- Only **non-streaming** requests.
+- Only JSON responses with **2xx** status.
+- Applies to:
+  - `POST /v1/chat/completions`
+  - `POST /v1/completions`
+  - `POST /v1/responses` (OpenAI Responses API)
+  - `POST /v1/messages`
+
+### Cache key
+Cache keys include the authenticated `apiKey` + method + path + query + request body, so different users/inputs do not collide.
+
+### Response header
+Cached requests return `X-CLIProxy-Cache: HIT` (and uncached attempts return `X-CLIProxy-Cache: MISS`).
+
+## Rate Limits
+
+Configure concurrency + RPM limits:
+
+```yaml
+rate-limits:
+  enable: true
+  max-parallel-requests: 100
+  max-per-key: 10
+  max-rpm: 60
+  max-tpm: 120000
+```
+
+Rate-limited requests return HTTP `429` with `{"error":"rate_limited", ...}` and increment `cliproxy_ratelimit_rejections_total`.
+
+### Token-Per-Minute (TPM)
+
+TPM limits protect upstream quotas from a small number of very large requests.
+
+Notes:
+- TPM is tracked per authenticated principal (`cfg:<sha256>` for static `api-keys`, `vk:<sha256>` for virtual keys).
+- Tokens are recorded after request completion (usage plugin), so enforcement is best-effort and may allow brief bursts.
+
+## Request/Response Size Limits
+
+CLIProxyAPI supports request/response size caps:
+
+```yaml
+limits:
+  max-request-size-mb: 10
+  max-response-size-mb: 50
+```
+
+Behavior:
+- Request bodies above the cap return HTTP `413`.
+- When `max-response-size-mb` is set, non-streaming upstream responses larger than the cap return HTTP `502`.
+
+## Cooldown Override
+
+Optionally apply a fixed cooldown window for specific HTTP status codes:
+
+```yaml
+cooldown:
+  enable: true
+  duration: 60
+  trigger-on: [429, 500, 502, 503, 504]
+```
+
+This is a simple “guardrail cooldown” that prevents immediate re-selection of a credential after repeated error codes. If the upstream returns `Retry-After`, that value is honored/extended.
+
+Note: quota backoff for 429 is still controlled separately via `disable-cooling`.
+
+## Fallback Chains (Cross-Provider Failover)
+
+Fallback chains provide model/provider failover on transient failures (network, 408, 429, 5xx):
+
+```yaml
+fallback-chains:
+  enable: true
+  chains:
+    - primary-model: "gpt-4o"
+      fallbacks:
+        - model: "claude-3-5-sonnet-20241022"
+          provider: "claude"
+```
+
+When a fallback succeeds, responses include `X-CLIProxy-Fallback` headers for debugging.
+
+## Retry Policy (Exponential Backoff)
+
+`retry-policy` adds exponential backoff retries for transient failures (network, 408, 5xx):
+
+```yaml
+retry-policy:
+  enable: true
+  max-retries: 3
+  initial-delay-ms: 1000
+  max-delay-ms: 30000
+  multiplier: 2.0
+  jitter: 0.1
+```
+
+Notes:
+- 429 is intentionally not retried via backoff; prefer cooldown/Retry-After.
+- This is additive to the existing cooldown-based `request-retry` behavior.
+- For OpenAI-compatible upstreams, you can pass `Idempotency-Key` to reduce duplicate charges when retries occur.
+
+## Routing Strategy
+
+When multiple credentials match, you can choose a selection strategy:
+
+```yaml
+routing:
+  strategy: "fill-first"    # fill-first (default), round-robin, random, least-busy, lowest-latency
+  health-aware: true        # Filter unhealthy credentials (COOLDOWN, ERROR)
+  prefer-healthy: true      # Prefer HEALTHY over DEGRADED when health-aware
+  fill-first-max-inflight-per-auth: 4  # 0 = unlimited
+  fill-first-spillover: "next-auth"    # next-auth (default), least-busy
+```
+
+Notes:
+- `least-busy` uses in-flight request counts; `lowest-latency` requires `health-tracking.enable: true`.
+- `fill-first` “burns” one account first; spillover prevents overload under bursty concurrency.
+- `next-auth` preserves deterministic “drain first”; `least-busy` maximizes throughput.
+
+### Fill-first spillover (recommended for “many creds”)
+
+`fill-first` intentionally “burns” one account first (to stagger rolling-window subscription caps), but with many concurrent terminals it can also overload a single credential, leading to avoidable `429` errors. Use `fill-first-max-inflight-per-auth` and `fill-first-spillover` to keep the intent while enabling safe spillover.
+
+- When the preferred credential is at capacity (`max-inflight`), selection spills over to another credential instead of overloading one.
+- `next-auth` preserves deterministic “drain first”; `least-busy` maximizes throughput under bursty load.
+
+Health-aware filtering uses `health-aware` and `prefer-healthy` (requires `health-tracking.enable: true`).
+
+## Streaming (Keep-Alives + Safe Bootstrap Retries)
+
+Streaming failures are only safe to “retry/fail over” **before any bytes are written** to the client. After that, a retry would duplicate/diverge output.
+
+```yaml
+streaming:
+  keepalive-seconds: 15    # SSE heartbeats (: keep-alive\n\n); <= 0 disables
+  bootstrap-retries: 2     # retries allowed before first byte; 0 disables
+```
+
+Notes:
+- Keep-alives reduce idle timeouts (Cloudflare/Nginx/proxies) during long pauses between chunks.
+- Bootstrap retries/fallbacks only run if the stream fails before producing any payload (safe failover).
+
+## “10 Terminals / Many Subscriptions” Recommended Defaults
+
+This configuration biases toward **predictable** routing (burn one account first) while reducing avoidable interruptions under bursty concurrency:
+
+```yaml
+routing:
+  strategy: "fill-first"
+  health-aware: true
+  prefer-healthy: true
+  fill-first-max-inflight-per-auth: 4
+  fill-first-spillover: "next-auth"
+
+health-tracking:
+  enable: true
+
+cooldown:
+  enable: true
+  duration: 60
+  trigger-on: [429, 500, 502, 503, 504]
+
+retry-policy:
+  enable: true
+  max-retries: 3
+  initial-delay-ms: 1000
+  max-delay-ms: 30000
+  multiplier: 2.0
+  jitter: 0.1
+
+streaming:
+  keepalive-seconds: 15
+  bootstrap-retries: 2
+```
+
+## Request Body Guardrails (Client-Side Upstream Targets)
+
+To prevent redirect attacks, CLIProxyAPI blocks `api_base` / `base_url` in request bodies by default:
+
+```yaml
+security:
+  allow-client-side-credentials: false
+```
+
+When disabled (default), requests containing `api_base` or `base_url` are rejected with HTTP `400`.
+
+## Virtual Keys (Managed Client Keys)
+
+This pattern generates per-user/team keys without editing `config.yaml`.
+
+Enable:
+
+```yaml
+virtual-keys:
+  enable: true
+```
+
+Management endpoints (require management key):
+- `GET /v0/management/virtual-keys`
+- `POST /v0/management/virtual-keys` (returns plaintext key once)
+- `DELETE /v0/management/virtual-keys/:selector`
+- `GET /v0/management/virtual-keys/:selector/budget`
+
+Policy enforcement (automatic for `vk:*` principals):
+- Budget caps (tokens and/or USD) with fixed windows
+- Model allowlists (wildcards)
+- Per-key model aliases (`model_aliases`) applied by rewriting the request JSON `model`
+
+## Pricing (Spend Tracking)
+
+Virtual-key cost budgets require pricing rules:
+
+```yaml
+pricing:
+  enable: true
+  models:
+    - match: "gpt-4o*"
+      input-per-1k: 5.0
+      output-per-1k: 15.0
+```
+
+When `pricing.enable: false`, virtual keys can still enforce token budgets, but cost budgets will return `cost_unknown`.
+
+## Pass-Through Endpoints
+
+Pass-through routes forward requests to an upstream base URL without writing a full translator.
+
+```yaml
+pass-through:
+  enable: true
+  endpoints:
+    - path: "/v1/rerank"
+      method: "POST"
+      base-url: "https://api.openai.com"
+      timeout: 60
+      headers:
+        Authorization: "Bearer os.environ/OPENAI_API_KEY"
+```
+
+Security behavior:
+- Hop-by-hop headers are stripped.
+- Proxy auth headers (`Authorization`, `X-Goog-Api-Key`, `X-Api-Key`) are stripped and must be provided via `headers`.
+- If the proxy key was provided via query (`?key=` / `?auth_token=`), that parameter is removed from the forwarded query string.
+
+## Health Endpoints + Background Probes
+
+Endpoints:
+- `GET /health/liveness` (fast, no upstream calls)
+- `GET /health/readiness` (feature status + optional probe summary)
+- `GET /health` (alias for readiness)
+
+Optional background probes:
+
+```yaml
+health:
+  background-checks:
+    enable: true
+    interval: 300
+```
+
+Probes are lightweight TCP connectivity checks to configured provider base URLs (no auth, no quota usage).
+
+## Management API Hardening
+
+- Auth file downloads are blocked for non-local clients by default.
+- To allow it, set:
+  ```yaml
+  remote-management:
+    allow-auth-file-download: true
+  ```
+
+### Auth file download behavior
+- By default, downloads return the stored bytes (encrypted envelope if encryption is enabled).
+- `GET /v0/management/auth-files/download?name=...&decrypt=1` is **localhost-only** and returns plaintext JSON (requires encryption key when files are encrypted).
+
+New endpoints:
+- `GET /v0/management/auth-files/errors`
+- `GET /v0/management/auth-providers`
+- `GET /v0/management/virtual-keys` (+ create/revoke/budget)
+
+### Config Redaction
+
+`GET /v0/management/config` returns a redacted config view (API keys/tokens masked). Use `GET /v0/management/config.yaml` to fetch the raw file (preserves comments).
diff --git a/docs/sdk-advanced.md b/docs/sdk-advanced.md
index 3a9d3e500..334216258 100644
--- a/docs/sdk-advanced.md
+++ b/docs/sdk-advanced.md
@@ -60,6 +60,7 @@ func (Executor) Refresh(ctx context.Context, a *coreauth.Auth) (*coreauth.Auth,
 Register the executor with the core manager before starting the service:
 
 ```go
+// nil selector uses the default "fill-first" selection strategy.
 core := coreauth.NewManager(coreauth.NewFileStore(cfg.AuthDir), nil, nil)
 core.RegisterExecutor(myprov.Executor{})
 svc, _ := cliproxy.NewBuilder().WithConfig(cfg).WithConfigPath(cfgPath).WithCoreAuthManager(core).Build()
@@ -135,4 +136,3 @@ The embedded server calls this automatically for built‑in providers; for custo
 - Enable request logging: Management API GET/PUT `/v0/management/request-log`
 - Toggle debug logs: Management API GET/PUT `/v0/management/debug`
 - Hot reload changes in `config.yaml` and `auths/` are picked up automatically by the watcher
-
diff --git a/docs/sdk-advanced_CN.md b/docs/sdk-advanced_CN.md
index 25e6e83c9..c9ed8b57f 100644
--- a/docs/sdk-advanced_CN.md
+++ b/docs/sdk-advanced_CN.md
@@ -55,6 +55,7 @@ func (Executor) Refresh(ctx context.Context, a *coreauth.Auth) (*coreauth.Auth,
 在启动服务前将执行器注册到核心管理器：
 
 ```go
+// selector 传 nil 时默认使用 "fill-first" 选择策略。
 core := coreauth.NewManager(coreauth.NewFileStore(cfg.AuthDir), nil, nil)
 core.RegisterExecutor(myprov.Executor{})
 svc, _ := cliproxy.NewBuilder().WithConfig(cfg).WithConfigPath(cfgPath).WithCoreAuthManager(core).Build()
@@ -128,4 +129,3 @@ cliproxy.GlobalModelRegistry().RegisterClient(authID, "myprov", models)
 - 启用请求日志：管理 API GET/PUT `/v0/management/request-log`
 - 切换调试日志：管理 API GET/PUT `/v0/management/debug`
 - 热更新：`config.yaml` 与 `auths/` 变化会自动被侦测并应用
-
diff --git a/docs/sdk-usage.md b/docs/sdk-usage.md
index 55e7d5f9a..8a425dc35 100644
--- a/docs/sdk-usage.md
+++ b/docs/sdk-usage.md
@@ -81,6 +81,7 @@ These options mirror the internals used by the CLI server.
 The service uses a core `auth.Manager` for selection, execution, and auto‑refresh. When embedding, you can provide your own manager to customize transports or hooks:
 
 ```go
+// nil selector uses the default "fill-first" selection strategy.
 core := coreauth.NewManager(coreauth.NewFileStore(cfg.AuthDir), nil, nil)
 core.SetRoundTripperProvider(myRTProvider) // per‑auth *http.Transport
 
diff --git a/docs/sdk-usage_CN.md b/docs/sdk-usage_CN.md
index b87f9aa1f..135ccf0b7 100644
--- a/docs/sdk-usage_CN.md
+++ b/docs/sdk-usage_CN.md
@@ -81,6 +81,7 @@ svc, _ := cliproxy.NewBuilder().
 服务内部使用核心 `auth.Manager` 负责选择、执行、自动刷新。内嵌时可自定义其传输或钩子：
 
 ```go
+// selector 传 nil 时默认使用 "fill-first" 选择策略。
 core := coreauth.NewManager(coreauth.NewFileStore(cfg.AuthDir), nil, nil)
 core.SetRoundTripperProvider(myRTProvider) // 按账户返回 *http.Transport
 
@@ -161,4 +162,3 @@ _ = svc.Shutdown(ctx)
 - 热更新：`config.yaml` 与 `auths/` 变化会被自动侦测并应用。
 - 请求日志可通过管理 API 在运行时开关。
 - `gemini-web.*` 相关配置在内嵌服务器中会被遵循。
-