Skip to content

Protect ByteStream pending error JSValue across GC#32579

Open
robobun wants to merge 6 commits into
mainfrom
farm/d343f069/s3-stream-error-gc-root
Open

Protect ByteStream pending error JSValue across GC#32579
robobun wants to merge 6 commits into
mainfrom
farm/d343f069/s3-stream-error-gc-root

Conversation

@robobun

@robobun robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Fuzzilli found a flaky crash (fingerprint StructureID.h(76), ASSERTION FAILED: decontaminate()) that reproduces reliably with:

const stream = Bun.S3Client.file("test").stream();
Bun.gc(true);
stream.text();

What happened

When an S3 download fails synchronously during .stream() (for example, missing credentials), the error is delivered to ByteStream::on_data as StreamError::JSValue(err) and parked in pending.result via ByteStream::append for later consumption. The raw JSValue was stored in a plain heap field with no GC root, so the error object could be collected before the user called .text()/.bytes()/.json(). to_buffered_value then read the freed cell and handed it to JSPromise::setSlot, whose write barrier dereferenced the swept structure ID (assert in debug, SIGILL in release).

The same storage path is reachable from fetch and Bun.serve request bodies, which pass StreamError::JSValue to on_data the same way.

Fix

StreamResult::release() and every to_js_weak consumer already treat StreamError::JSValue as a protected handle and call unprotect() when done. Match that contract by calling protect() in ByteStream::append at the two points where the error is parked in pending.result, so the value survives until one of those consumers unprotects it.

Test

Added a regression test in the "s3 missing credentials" block that spawns a child with BUN_JSC_slowPathAllocsBetweenGCs=5, creates a failing S3 stream, forces GC, and asserts .text() rejects with ERR_S3_MISSING_CREDENTIALS and exit code 0. Before this change the child crashed (exit 132 in release, abort in debug).

When a ByteStream source (S3, fetch, request body) stores a
StreamError::JSValue into pending.result via append(), the value was
not GC-protected. If GC ran before the consumer (text()/bytes()/etc.)
read it back in to_buffered_value(), the error object was collected
and the subsequent JSPromise::setSlot hit a freed cell
(decontaminate() assert / SIGILL).

StreamResult::release() already unprotects StreamError::JSValue, so
protect it at the point it is parked in pending.result for later
consumption.
@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 4:45 PM PT - Jun 21st, 2026

@robobun, your commit 32db77f has 2 failures in Build #63838 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32579

That installs a local version of the PR into your bun-32579 executable, so you can run:

bun-32579 --bun

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5eca7ad2-1926-4cfe-aac9-94c41860b0b8

📥 Commits

Reviewing files that changed from the base of the PR and between cc52e27 and 32db77f.

📒 Files selected for processing (1)
  • src/runtime/webcore/ByteStream.rs

Walkthrough

In ByteStream::append, both the zero-capacity and non-zero-capacity Err branches now call protect() on a StreamError::JSValue before storing it in self.pending.result. The non-zero-capacity path also releases the previous result. In ByteStream::on_cancel, pending result cleanup is reordered to happen unconditionally before buffer clearing. In ByteStream::to_buffered_value, ensure_still_alive() is called on the error JS value before resetting the pending result. A new test verifies the JS error value survives a forced GC cycle before the S3 stream is consumed.

Changes

GC protection for ByteStream JSValue errors

Layer / File(s) Summary
protect() JSValue when storing error in ByteStream::append
src/runtime/webcore/ByteStream.rs
Both the zero-capacity and non-zero-capacity Err branches in ByteStream::append detect StreamError::JSValue and call protect() before storing the error in self.pending.result. The non-zero-capacity branch releases the previous result before overwriting.
JSValue lifetime across result cleanup in on_cancel and to_buffered_value
src/runtime/webcore/ByteStream.rs
In ByteStream::on_cancel, pending result cleanup (release() and set to Done) is moved to execute unconditionally before the subsequent buffer-clearing logic. In ByteStream::to_buffered_value, when the pending result is an Err, ensure_still_alive() is called on the error JS value before releasing and resetting the pending result.
S3 stream error GC regression test
test/js/bun/s3/s3-stream-error-gc.test.ts
New test spawns a child process that creates an S3 stream, forces GC via Bun.gc(true), then calls stream.text() to trigger error consumption. The parent process asserts that the child exits successfully and the rejection error code is ERR_S3_MISSING_CREDENTIALS.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: protecting a ByteStream pending error JSValue across garbage collection cycles.
Description check ✅ Passed The description covers both required sections with detailed information: 'What does this PR do?' is addressed comprehensively with root cause analysis and fix explanation, and 'How did you verify your code works?' is addressed with the regression test details.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/webcore/ByteStream.rs (1)

296-302: ⚠️ Potential issue | 🔴 Critical

Add protect() call at line 299 to match error-handling pattern at lines 368 and 394.

Line 299 stores StreamError::JSValue without calling protect() first, while the parallel error-handling paths in append (lines 368 and 394) both protect JSValue errors before storage. Since JSValue errors are delivered to on_data() from FetchTasklet (confirmed at FetchTasklet.rs:670), this code path is reachable and exposes the same UAF risk: the JSValue is stored in pending.result without a GC root, allowing premature collection while pending still holds the reference.

Apply the same fix as lines 368 and 394:

if let streams::StreamError::JSValue(v) = &err {
    v.protect();
}

Insert before line 299 (p.result = streams::Result::Err(err.clone());).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/webcore/ByteStream.rs` around lines 296 - 302, The code at line
299 storing a StreamError to pending.result is missing the protect() call that
is applied in the parallel error-handling paths at lines 368 and 394. Before
storing the error via p.result = streams::Result::Err(err.clone()), add a guard
that checks if the error is a StreamError::JSValue variant and calls protect()
on it to prevent premature garbage collection. Insert this protection check
immediately before the line that assigns the error to pending.result, matching
the exact pattern used in the append method's error-handling blocks to ensure
consistent JSValue lifecycle management across all error storage paths in
ByteStream.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/runtime/webcore/ByteStream.rs`:
- Around line 296-302: The code at line 299 storing a StreamError to
pending.result is missing the protect() call that is applied in the parallel
error-handling paths at lines 368 and 394. Before storing the error via p.result
= streams::Result::Err(err.clone()), add a guard that checks if the error is a
StreamError::JSValue variant and calls protect() on it to prevent premature
garbage collection. Insert this protection check immediately before the line
that assigns the error to pending.result, matching the exact pattern used in the
append method's error-handling blocks to ensure consistent JSValue lifecycle
management across all error storage paths in ByteStream.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9cbba5ed-6e80-448a-92cf-6be89b9cd424

📥 Commits

Reviewing files that changed from the base of the PR and between 602d956 and 8b1b797.

📒 Files selected for processing (2)
  • src/runtime/webcore/ByteStream.rs
  • test/js/bun/s3/s3.test.ts

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Recurring segfault after POST to a Hono route mounting @hono/mcp StreamableHTTPTransport (Bun 1.3.13 and 1.3.14) #31004 - Recurring segfault after POST to a Hono route with StreamableHTTPTransport; crash occurs at low addresses (0x0, 0x8, 0xD0) after request body consumption via Bun.serve, consistent with a GC collecting an unrooted JSValue error in ByteStream's pending result

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #31004

🤖 Generated with Claude Code

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Re: the coderabbit note on ByteStream.rs:299:

That branch is only reached when pending.state == Pending, i.e. a reader is already waiting. The error is stored at line 299 and then consumed synchronously a few lines down at self.pending.with_mut(|p| p.run()) (line 336), which hands it to fulfill_promise and clears pending.result before returning. The original stream local stays on the stack for the whole call, so conservative stack scanning keeps the value alive; there is no window for GC to collect it.

The two append sites are different: they park the error in pending.result and return to JS, so the value has to survive an arbitrary number of GC cycles before the user drains the stream. Those are the sites this PR protects.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/s3/s3-stream-error-gc.test.ts`:
- Around line 32-36: The stderr output from the process is being collected but
is not asserted in the final expectation. Modify the expect statement to include
stderr in the expected object alongside stdout and exitCode, so that the
assertion validates all three process outputs: stderr (with trim() applied),
stdout (already present), and exitCode (already present). This will ensure
unexpected runtime warnings or errors in stderr are caught and prevent false
positives.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 980e4ab7-8423-4392-af82-c64febcac5e0

📥 Commits

Reviewing files that changed from the base of the PR and between 8b1b797 and a3d4eb8.

📒 Files selected for processing (1)
  • test/js/bun/s3/s3-stream-error-gc.test.ts

Comment thread test/js/bun/s3/s3-stream-error-gc.test.ts
@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Fix use-after-free of stream error objects parked in ByteStream #31963 - Fixes the same unprotected JSValue use-after-free in ByteStream error handling, with a more comprehensive approach (protecting at all producer sites via StreamError::strong() and balancing unprotect at every consumer)

🤖 Generated with Claude Code

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This is the minimal two-line version of #31963, which takes the broader route of protecting at every StreamError::JSValue producer and balancing the release at every consumer. Either one resolves the Fuzzilli StructureID.h(76) crash; happy to close whichever is not wanted.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/js/bun/s3/s3.test.ts:1617-1621 — nit: stderr is read into a local but dropped from the expect, so if this regression ever recurs (child crashes with SIGILL/abort) the crash diagnostic on stderr won't appear in the CI failure diff. Per CLAUDE.md's subprocess-test convention, include stderr in the asserted object — e.g. expect({ stdout: stdout.trim(), stderr, exitCode }).toEqual({ stdout: ..., stderr: expect.any(String), exitCode: 0 }) — so it surfaces on failure without requiring it to be exactly empty (ASAN/debug builds emit benign warnings).

    Extended reasoning...

    What's wrong

    The new test at test/js/bun/s3/s3.test.ts:1617 pipes stderr, awaits proc.stderr.text(), and destructures it into a stderr local — but the assertion on the next line only checks { stdout: stdout.trim(), exitCode }. The stderr value is read and then silently discarded.

    Why it matters here specifically

    This test exists to catch a GC crash: before the fix, the child process died with SIGILL (release) or an ASSERTION FAILED: decontaminate() abort (debug). In both cases the only useful diagnostic — the assertion text, the crash backtrace, the signal — lands on stderr, not stdout. If this regression (or a related one) ever reappears, the test will fail with a diff like:

    - exitCode: 0
    + exitCode: 132
    - stdout: "rejected ERR_S3_MISSING_CREDENTIALS"
    + stdout: ""
    

    …and nothing else. Whoever triages it in CI has no idea why the child died, because the stderr that explains it was thrown away.

    Repo convention

    CLAUDE.md:197 ("Tests reviewers reject") is explicit about this pattern:

    Subprocess tests: … assert a combined { stdout, stderr, exitCode } object.

    The same line also says:

    Never assert stderr is exactly empty (ASAN/debug builds emit benign warnings)

    So the fix is not stderr: "" — that would flake on ASAN builds. Instead, include stderr in the asserted object with a permissive matcher so it shows up in the failure diff without constraining the pass case:

    expect({ stdout: stdout.trim(), stderr, exitCode }).toEqual({
      stdout: "rejected ERR_S3_MISSING_CREDENTIALS",
      stderr: expect.any(String),
      exitCode: 0,
    });

    (or wrap with normalizeBunSnapshot per the surrounding test conventions — either way, the point is that stderr participates in the diff.)

    Step-by-step proof

    1. Revert the v.protect() calls in ByteStream::append.
    2. Run this test. The child crashes; exitCode is 132 (or 134 in debug), stdout is empty, stderr contains ASSERTION FAILED: decontaminate() plus a stack trace.
    3. With the current assertion, Jest's diff shows only the stdout/exitCode mismatch — stderr is not part of the compared object, so the crash message never appears in the test output.
    4. With stderr included in the object, the same failure prints the full stderr in the "Received" half of the diff, immediately telling the reader what crashed.

    Severity

    Nit — this is test diagnostics quality, not a correctness bug in the fix itself. The runtime change is fine; this just makes the guard test more useful when it fails.

Comment thread src/runtime/webcore/ByteStream.rs Outdated
Comment thread src/runtime/webcore/ByteStream.rs
robobun added 2 commits June 21, 2026 21:30
append() now protects a parked StreamError::JSValue, so on_cancel must
release it even when no pull is outstanding (view is empty). Previously
the release was inside the !view.is_empty() guard, and finalize() skips
its own release once done is set, so a cancel-without-pull would leak
the protect handle.
Comment thread src/runtime/webcore/ByteStream.rs
Matches every other release site in this file so a stale Err(JSValue)
with an already-unprotected handle is never observable.
@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

CI status:

  • Build 63833 (cc52e27): 285/286 passed. The one failure was fetch-leak.test.ts "should not leak using readable stream" on darwin-14-aarch64, over the 5 MB RSS threshold by ~2-3 MB. That fixture exercises successful fetch bodies read to completion via getReader(), which never enters the Err branch this PR touches. The same test has flaked on unrelated branches in builds 63621, 63590, 63510, and passes 5/5 locally on a release build of this branch.
  • Build 63838 (32db77f, in progress): the only failures so far are 5 debian-x64-asan shards where bun install hit HTTP 5xx from api.github.com/repos/oven-sh/bun.report/tarball/912ca63 and never got to run tests.

The fix is verified locally and all review feedback is applied. Retrying the failed shards once GitHub recovers should be enough.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three of my earlier comments are addressed and I found nothing new on 32db77f, but this is GC protect/unprotect bookkeeping in a core stream path (S3/fetch/serve bodies) and there's still the open question of landing this vs #31963 — worth a human pass.

Extended reasoning...

Overview

Adds protect() on StreamError::JSValue at the two ByteStream::append sites that park an error in pending.result, plus three follow-up consistency fixes from review: release() before overwrite in both append Err arms, unconditional release() in on_cancel, and release() + reset to Done (with ensure_still_alive()) in to_buffered_value. One new regression test spawns a child with BUN_JSC_slowPathAllocsBetweenGCs=5 and verifies the parked error survives a forced GC.

Security risks

None in the conventional sense. The change is a memory-safety fix (UAF → crash) in the JS runtime; the risk profile is over/under-protect rather than auth/injection/data exposure. An unbalanced protect() would leak a GC root; an unbalanced unprotect() would reintroduce a UAF. After three review iterations the visible mutation sites in this file all pair release() with a reset to Done or a fresh value.

Level of scrutiny

High. ByteStream backs ReadableStream sources for S3, fetch response bodies, and Bun.serve request bodies — production-critical paths. JSC protect/unprotect discipline is notoriously easy to get subtly wrong: the original two-line fix introduced two latent leaks (overwrite-without-release, and the on_cancel no-pull-outstanding path) that only surfaced under review. That history argues for a maintainer who knows the StreamResult/Pending ownership model to confirm no other teardown path was missed.

Other factors

  • All three of my prior inline comments are applied and resolved; the bug hunter found nothing on 32db77f.
  • A competing PR (#31963) takes the broader StreamError::strong() approach at every producer; the author has offered to close whichever isn't wanted. That's a design call a maintainer should make.
  • CI on cc52e27 was 285/286 with one known-flaky fetch-leak.test.ts unrelated to the Err path; 32db77f's only failures so far are infra (GitHub tarball 5xx). The musl build failures noted on f3dee09 should be confirmed resolved before merge.

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Build 63838 finished 280/286. Remaining failures:

  • 5x debian-13-x64-asan: bun install hit GitHub API HTTP 5xx on bun.report tarball; never ran tests
  • 1x darwin-14-x64: terminal.test.ts "creates subprocess with terminal attached" timed out after 90s; this test has timed out in 11 other recent builds (63709, 63717, 63724, 63726, 63732, 63766, 63783, 63786, 63811, 63822, 63826)

darwin-14-aarch64, which flaked on fetch-leak.test.ts in 63833, passed cleanly here. The new s3-stream-error-gc.test.ts passed on every platform that ran tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant