Skip to content

hot: GC-protect pending_internal_promise on every store path#32572

Open
robobun wants to merge 4 commits into
mainfrom
farm/9b887ac5/root-pending-internal-promise
Open

hot: GC-protect pending_internal_promise on every store path#32572
robobun wants to merge 4 commits into
mainfrom
farm/9b887ac5/root-pending-internal-promise

Conversation

@robobun

@robobun robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Crash

Sentry BUN-38CT (191 events, Windows, bun 1.3.14):

src/bun.js.zig:493            start
src/jsc/VirtualMachine.zig:722 reportExceptionInHotReloadedModuleIfNeeded
src/jsc/JSPromise.zig:230     status
src/jsc/bindings/bindings.cpp  JSC__JSPromise__status
JSC::JSPromise::status(...) const
JSC::WriteBarrierBase<JSC::Unknown,...>::get   <- SIGSEGV, heap-looking fault address

Cause

VirtualMachine.pending_internal_promise: Option<*mut JSInternalPromise> is a raw GC-cell pointer that the --hot run loop polls every tick (report_exception_in_hot_reloaded_module_if_needed and reload). It was only protect()'d on the preload-return path; every other store site (loadAndEvaluateModule for the main module, JSC__JSInternalPromise__resolvedPromise on the patched-runMain path, Bun__VirtualMachine__setOverrideModuleRunMainPromise, and the per-preload store in load_preloads) left it unprotected.

The promise returned by JSModuleLoader::loadModule is not retained by the module loader once it settles: it is a fresh JSPromise::create whose only inbound GC edge is a microtask reaction that detaches when it fires (JSModuleLoader.cpp: resultPromise at the tail of both loadModule overloads; never written to m_moduleMap/m_loadedModules/ModuleGraphLoadingState). So once the entry module finishes evaluating, the only thing keeping the cell alive is conservative stack scanning of the caller's dead locals. When the optimizer or a later call chain clobbers that slot, the next Bun.gc/incremental collection frees the cell and the per-tick status() read dereferences a dead JSPromise.

Fix

Add VirtualMachine::set_pending_internal_promise which unprotects the previously stored promise (if any) and protects the new one, and route all store sites through it:

  • reload_entry_point (preload, patched-runMain, loadAndEvaluateModule, HTML-entry, no-transpile)
  • reload_entry_point_for_test_runner (preload, loadAndEvaluateModule)
  • load_preloads per-iteration store (keeps an additional scoped .protected() guard so the loop-entry local survives an HMR field swap inside tick())
  • set_override_module_run_main_promise
  • swap_global_for_test_isolation clear
  • reload()'s explicit unprotect block is removed since the next store now balances it

With every store routed through the helper the field is protected exactly when Some, so the separate pending_internal_promise_is_protected bool is dropped. The now-redundant ensure_still_alive() calls that followed each store are also removed.

Verification

The race itself is not deterministically reproducible in-process: conservative stack scanning kept the cell alive in every GC-stress scenario tried (useZombieMode, scribbleFreeCells, collectContinuously, aggressive Bun.gc(true) across hot reloads), confirmed by instrumenting report_exception_in_hot_reloaded_module_if_needed to call cell->inherits<JSC::JSPromise>() after each collection. The test instead asserts the invariant the fix establishes: under --hot, heapStats().protectedObjectTypeCounts must show at least one protected Promise both during module evaluation and on the next tick.

# before (src/ at main)
"duringProtectedPromises": false,
"afterProtectedPromises": false,
"duringCounts": { "GlobalObject": 1, "UnlinkedProgramCodeBlock": 1 },

# after
(pass) roots the entry-point evaluation promise so per-tick status reads can't see a freed cell

The protect is balanced across reloads (stays at 1 over 10 reloads).

protect-count across reloads
LOAD 1 protectedPromises=1 protectedTotal=3
LOAD 2 protectedPromises=1 protectedTotal=4
LOAD 3 protectedPromises=1 protectedTotal=4
...
LOAD 10 protectedPromises=1 protectedTotal=4

Full hot.test.ts, watch.test.ts, bun-main-entry-point.test.ts, and preload.test.ts pass locally.

The promise returned by JSModuleLoader::loadModule is not retained by
JSC once it settles; the --hot run loop polls pending_internal_promise
every tick via report_exception_in_hot_reloaded_module_if_needed and
reload(), so the stored cell pointer must be a GC root. Only the
preload path protected it; the common loadAndEvaluateModule /
resolvedPromise / setOverrideModuleRunMainPromise paths stored the raw
pointer with is_protected = false, leaving a window where a collection
between ticks frees the cell and the next status() read dereferences a
dead JSPromise (BUN-38CT: SIGSEGV in JSPromise::status ->
WriteBarrierBase::get).

Route every store through a set_pending_internal_promise helper that
unprotects the outgoing value and protects the incoming one. The
protect count is balanced across reloads (stays at 1).
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR centralizes GC-root management for VirtualMachine's pending_internal_promise field by introducing a set_pending_internal_promise setter that atomically calls protect/unprotect on the stored JSC pointer. The pending_internal_promise_is_protected boolean flag is removed, and all direct field writes across reload_entry_point, reload, reload_entry_point_for_test_runner, swap_global_for_test_isolation, set_override_module_run_main_promise, and load_preloads are replaced with setter calls. A regression test verifies the promise remains GC-protected across hot-reload tick-loop swaps.

Changes

Centralize GC rooting for pending_internal_promise

Layer / File(s) Summary
GC-root contract and set_pending_internal_promise setter
src/jsc/JSPromise.rs, src/jsc/VirtualMachine.rs
JSPromise::status_ptr docs are rewritten to specify caller GC-root responsibility. VirtualMachine drops pending_internal_promise_is_protected, updates the field doc to the "GC-protected while Some" invariant, and adds set_pending_internal_promise which unprotects the outgoing pointer and protects the incoming one atomically.
Migration of all call sites to setter
src/jsc/VirtualMachine.rs, src/runtime/hw_exports.rs, src/runtime/jsc_hooks.rs
Every direct write to pending_internal_promise and every manual protect/unprotect/flag access in reload_entry_point, reload, reload_entry_point_for_test_runner, swap_global_for_test_isolation, set_override_module_run_main_promise, and load_preloads is replaced with set_pending_internal_promise. Stale comments about _protected drop behavior are removed; a new comment in load_preloads explains that HMR can swap the VM field during tick(), so a scoped independent rooting of the local promise is kept for the fallback read.
Hot-reload GC-protection regression test
test/cli/hot/hot.test.ts
Adds a test that spawns bun --hot --no-clear-screen run against a generated pip-root.js, samples heapStats().protectedObjectTypeCounts both during evaluation and in a setImmediate, and asserts protected Promise counts are non-zero in both snapshots and that the subprocess exits with code 0.

Suggested reviewers

  • Jarred-Sumner
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing GC protection for the pending_internal_promise field on every store path in the hot reload system.
Description check ✅ Passed The description comprehensively covers what the PR does (fixes GC-protection of pending_internal_promise), root cause analysis, the implemented fix, and verification including test results and protected object counts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 11:58 AM PT - Jun 21st, 2026

@robobun, your commit 08fce46 has 3 failures in Build #63820 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32572

That installs a local version of the PR into your bun-32572 executable, so you can run:

bun-32572 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. Bun.build() inside a bun --hot entry permanently disables hot reload #30436 - Bun.build() inside a --hot entry stores into pending_internal_promise via an unprotected path; the GC can free the promise cell, breaking the reload pipeline permanently
  2. Hotreload Hono server, Crash. and restart hotreload, Crash #19143 - Segfault at address 0x10 (small offset from a freed cell) in --hot mode during script execution, consistent with the GC freeing the unprotected pending_internal_promise between ticks

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #30436
Fixes #19143

🤖 Generated with Claude Code

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

On the suggested linked issues:

  • Hotreload Hono server, Crash. and restart hotreload, Crash #19143 crashes in JSC::BytecodeGenerator::BytecodeGenerator via ScriptExecutable::prepareForExecutionImpl during a function call, not in JSPromise::status. That's a different bug; not linking it.
  • Bun.build() inside a bun --hot entry permanently disables hot reload #30436's symptom (hot reload silently stops after await Bun.build() in the entry) is consistent with a freed entry-point promise reading back as Pending so reload() defers forever, which is one manifestation of this change. However, the exact repro from that issue no longer reproduces on current main (1.4.0) in my environment, so I can't confirm this closes it and I'm not adding Fixes #30436.

Comment thread src/runtime/jsc_hooks.rs
Comment thread test/cli/hot/hot.test.ts Outdated
Comment thread src/jsc/VirtualMachine.rs Outdated
Comment thread src/jsc/VirtualMachine.rs
…field

Every store goes through set_pending_internal_promise, so the invariant
is_protected == pending_internal_promise.is_some() always holds and the
bool mirrors recoverable information. The previous readers in reload()
and swap_global_for_test_isolation() were removed in 9ea8279, and
VirtualMachine is repr(Rust) so there is no layout contract.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/jsc/VirtualMachine.rs`:
- Around line 2275-2282: The set_pending_internal_promise method protects a
JSValue when storing a promise but this protection is never released during VM
teardown. In the destroy() method of VirtualMachine, add a call to
set_pending_internal_promise(None) at the very beginning while the JSC VM and
global objects are still alive. This will trigger the unprotect logic in
set_pending_internal_promise and ensure the protect reference count is balanced
on the terminal destruction path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 813c15d9-794f-4472-99d8-4f06056e06b4

📥 Commits

Reviewing files that changed from the base of the PR and between 602d956 and 9129879.

📒 Files selected for processing (5)
  • src/jsc/JSPromise.rs
  • src/jsc/VirtualMachine.rs
  • src/runtime/hw_exports.rs
  • src/runtime/jsc_hooks.rs
  • test/cli/hot/hot.test.ts

Comment thread src/jsc/VirtualMachine.rs

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback addressed and no new issues found — deferring to a human since this reworks GC protect/unprotect bookkeeping for a raw cell pointer across eight unsafe store sites in the core runtime.

Extended reasoning...

Overview

Centralizes GC rooting of VirtualMachine.pending_internal_promise into a new set_pending_internal_promise setter that unprotects the previous cell and protects the new one, routes all eight store sites through it (reload_entry_point, reload_entry_point_for_test_runner, load_preloads, set_override_module_run_main_promise, swap_global_for_test_isolation, the patched-runMain and HTML/no-transpile paths), drops the now-redundant pending_internal_promise_is_protected bool field, tightens the status_ptr doc contract, and adds a --hot test that asserts at least one Promise is in protectedObjectTypeCounts during and after entry-point evaluation.

Review history

I left four findings across two prior rounds; all are addressed in 3a0ad44 and 9129879:

  • The scoped _protected guard in load_preloads is retained alongside the setter (with a comment explaining the HMR-swap rationale), so the post-loop status() deref stays rooted.
  • The five now-redundant ensure_still_alive() calls were removed.
  • The vacuous expect.any(...) assertion was rewritten so the combined-object toEqual actually fails (and prints the counts map) on regression.
  • The pending_internal_promise_is_protected field was deleted and the setter branches on the Option directly; grep confirms no remaining .rs references.

Security risks

No new attack surface — this is internal GC-root bookkeeping, no parsing/auth/crypto. The risk class is memory safety: an unbalanced unprotect() would reintroduce the very use-after-free this fixes, and an unbalanced protect() would leak one promise per reload. The setter's invariant (Some ⇔ protected) looks correct on every path I traced, and the protect-count-across-reloads check in the PR description shows it stays at 1 over 10 reloads.

Level of scrutiny

High. This is core --hot run-loop code in VirtualMachine.rs/jsc_hooks.rs with pervasive unsafe raw-pointer derefs of GC-managed cells, the underlying crash is non-deterministic (conservative stack scanning masks it), and the test asserts an indirect invariant rather than reproducing the SIGSEGV. The reentrant load_preloads-inside-reload() path in particular is subtle enough that a maintainer familiar with the hot-reload state machine should sign off.

Other factors

Removing a public struct field is a small layout change (though repr(Rust), so no ABI contract). CI build #63812 is in progress. hot.test.ts, watch.test.ts, bun-main-entry-point.test.ts, and preload.test.ts reportedly pass locally.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four points from my earlier reviews are addressed (scoped guard restored in load_preloads, redundant ensure_still_alive dropped, test assertion made non-vacuous, _is_protected field removed) and I found nothing new — deferring only because GC-root lifecycle changes across the reentrant --hot reload path are the kind of thing a human should sign off on.

Extended reasoning...

Overview

This PR centralizes GC-root management for VirtualMachine.pending_internal_promise by introducing set_pending_internal_promise(), which atomically unprotects the outgoing pointer and protects the incoming one. All store sites in reload_entry_point, reload_entry_point_for_test_runner, reload, swap_global_for_test_isolation, set_override_module_run_main_promise, and load_preloads are routed through it. The now-redundant pending_internal_promise_is_protected bool is dropped, the status_ptr doc is tightened to state the caller's GC-root obligation, and a regression test in hot.test.ts asserts via heapStats().protectedObjectTypeCounts that at least one Promise is protected both during entry-module evaluation and on the next tick.

Security risks

None identified. This is internal GC-root bookkeeping for the --hot run loop; it does not touch auth, crypto, permissions, parsing of untrusted input, or any externally-reachable surface. The risk profile is memory-safety (use-after-free of a JSC heap cell), which the change strictly tightens relative to main.

Level of scrutiny

High. The change is small and well-reasoned, but it sits at the intersection of JSC's protect/unprotect refcounting, conservative stack scanning, and the reentrant --hot reload path (where tick() can re-enter reload_entry_point and swap the field underneath an in-flight load_preloads loop). My first-pass review caught exactly that reentrancy edge (the scoped guard removal in load_preloads), which the author fixed; that the edge existed at all is why I'd want a human familiar with the hot-reload state machine to confirm the protect/unprotect balance on every path before this ships against a 191-event Sentry crash.

Other factors

  • All four of my earlier inline comments are addressed in commits 3a0ad44 and 9129879, and the bug-hunting pass on the current revision found nothing.
  • CodeRabbit's note about clearing the slot in destroy() was resolved by the author without a code change; that seems reasonable (worker VMs don't run --hot so the field is never Some there, and the main VM exits the process rather than calling destroy()), but it's worth a second opinion.
  • The new test asserts the invariant rather than reproducing the race (which the PR description explains is not deterministically reproducible), and the author confirmed the protect count stays balanced at 1 across 10 reloads.
  • No CODEOWNERS entries cover the touched files.

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

CI status: the diff itself is green. The new test in hot.test.ts passes on every lane (Windows x64-baseline hot.test.ts shows 12 pass / 1 fail, and the failing case is the pre-existing "should not remap against a stale sourcemap" rename race, which passed on retry).

The hard failures on both build 63812 and 63820 are puppeteer's chrome-headless-shell download failing on the darwin-aarch64-26 runner in next-pages/test/{dev-server,dev-server-ssr-100,next-build}.test.ts:

error: The browser folder (.../chrome-headless-shell/mac_arm-139.0.7258.66) exists but the executable (...) is missing

which is an infra/cache issue on that agent, not a result of this change. The remaining warnings are known flakes that passed on retry (spawn-pipe-leak RSS threshold on win-aarch64, bun install timeouts on x64-asan, bake/dev-and-prod HMR timing on win-x64). Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant