Skip to content

Performance Bottleneck and OOM in PATCH Requests During Large Image Pushes #119

@njuptlzf

Description

@njuptlzf

Description

Currently, the Registry Worker faces significant stability and performance issues when handling large image pushes, specifically during the PATCH phase of chunked uploads.

Observed Problems

  1. Memory Exhaustion (OOM): In src/router.ts, when a PATCH request arrives without a Content-Length header, the code executes await req.blob(). This action buffers the entire chunk into the Worker's memory. For large chunks (e.g., 50MB-100MB+), this frequently exceeds the Worker's memory limits, causing immediate crashes.
  2. Request Timeouts: Buffering the entire request body before initiating the upload to R2 consumes a significant portion of the request's time limit. This leads to 10-minute timeouts on slower connections or with very large chunks.
  3. Broken UX (Progress Jitter): From the user's perspective, the CLI progress bar (e.g., in docker push or regctl) freezes while the Worker is buffering, and then "jumps" forward once the R2 upload finally begins. This makes the transfer appear unreliable or "stuck".
  4. Protocol Parsing Fragility: The current parsing of the Content-Range header is brittle and fails to handle common variations (like the bytes prefix), leading to 416 Range Not Satisfiable errors even when the client is behaving correctly.

Impact

  • Inability to push large layers (typically > 100MB).
  • High error rates in production due to OOM.
  • Poor user experience due to perceived hangs and sudden jumps in transfer progress.

Test

I'm using regctl, which is friendly to chunked uploads.

# regctl registry config
{
  "hosts": {
    "<src-domain>": {
      "tls": "enabled",
      "hostname": "<src-domain>"
    },
    "<worker-custom-domain>": {
      "tls": "enabled",
      "regcert": "system",
      "hostname": "<worker-custom-domain>",
      "api": "default",
      "blobChunk": 52428800,
      "blobMax": 52428800,
      "reqPerSec": 10,
      "reqConcurrent": 4
    }
  }
}
# regctl  image copy --fast  <src-domain>/xxxxxxxx/xxxx/xxxxxx:1.0.1 <worker-custom-domain>/xxxxxxxx/xxxx/xxxxxx:1.0.1
time=2026-02-05T16:52:27.711+08:00 level=WARN msg="API field has been deprecated" api=default host=<worker-custom-domain>
time=2026-02-05T16:52:27.711+08:00 level=WARN msg="Changing reqPerSec settings for registry" orig=3 new=4 host=<worker-custom-domain>
time=2026-02-05T16:52:27.730+08:00 level=WARN msg="failed to setup CA pool" err="failed to load host specific ca (registry: <worker-custom-domain>): pem.Decode is nil: system"
sha256:da3b410 [========================================] 100.00% 1.143kB/1.143kB
sha256:1c8e5e0 [========================================] 100.00% 60.062MB/60.062MB
sha256:c415b60 [==>                                     ] 6.86% 52.429MB/764.153MB
Manifests: 2/5 | Blobs: 250.613MB copied, 883.550MB skipped, 711.724MB queued | Elapsed: 784s

It keeps getting stuck at the chunk threshold of 52MB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions