Keep local MessageChannel ports alive while a message listener is attached#32564
Keep local MessageChannel ports alive while a message listener is attached#32564robobun wants to merge 10 commits into
Conversation
…ached
A MessagePort created as one end of a local MessageChannel never goes
through entangle(), so it never wired onDidChangeListener and never ref'd
the event loop when a message listener was added. port.on('message') and
addEventListener('message') therefore delivered the queued message and let
the process exit, while Node keeps it alive. The .onmessage setter worked
only because it took a separate ref.
Wire onDidChangeListener for every port in the constructor and fold the
listener-count ref and the explicit ref()/unref() into one m_hasRef flag
plus a self-ref, mirroring Node's handle ref: the first message listener
refs, the last one unrefs, ref()/unref() override, and removing the last
listener releases regardless of ref(). The onmessage/onmessageerror setters
no longer ref directly; attaching the listener does it, so a messageerror
listener alone no longer holds the loop open (matching Node).
|
Updated 2:08 PM PT - Jun 21st, 2026
❌ @robobun, your commit ab6dfe5 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 32564That installs a local version of the PR into your bun-32564 --bun |
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Walkthrough
ChangesMessagePort event-loop liveness and peer-close refactor
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/web/workers/message-channel.test.ts`:
- Around line 351-365: The expectStaysAlive function leaves stderr unconsumed,
which can cause the OS pipe buffer to fill and deadlock the child process if
debug or ASAN builds emit warnings. Modify the function to drain stderr
concurrently alongside the existing timeout and process exit logic using
Promise.all, similar to how expectExitsOnItsOwn does it. Specifically, add
proc.stderr.text() to the Promise.all array that already contains
proc.exited.then() and Bun.sleep() so that stderr is continuously drained while
waiting for the process outcome.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4f9eb02b-c97a-4559-907b-56affb5eff98
📒 Files selected for processing (4)
src/jsc/bindings/webcore/JSMessagePort.cppsrc/jsc/bindings/webcore/MessagePort.cppsrc/jsc/bindings/webcore/MessagePort.htest/js/web/workers/message-channel.test.ts
jsRef()/jsUnref() reach the event loop through scriptExecutionContext() now, so their JSGlobalObject* parameter is dead. Drop it and update the ref/unref prototype bindings. The jsUnref() call that preceded close() in the close binding is also redundant since close() releases the event-loop ref itself; remove it. In the keep-alive test, drain the child's stderr in the background so a debug/ASAN warning can't fill the OS pipe buffer and deadlock a process the test expects to stay alive.
|
Thanks, but I'm keeping this PR scoped to #32562 and not tagging They share the keep-alive root cause, but #32563 needs something this PR does not implement: the MessagePort With only this PR, #32563's repro actually gets worse: the Node prints The |
…e loop Keeping a port alive while it has a message listener (the previous commit) is only half of Node's behavior: when one end of a channel is closed, Node closes both ends, firing 'close' on the peer and letting it stop holding the event loop. Bun never notified the peer, so a port with a message listener whose peer was explicitly closed would pin the loop forever. This hung test-worker-message-port-receive-message and test-worker-message-port-transfer-duplicate, and left the issue #32563 repro (port2 listening, port1.close()) printing only the message. MessagePort::close() now calls MessagePortPipe::notifyPeerClosed(), which sets a PeerClosed bit on the peer side and wakes its context through the existing drain pipeline. drainAndDispatch() consumes that bit once its inbox is fully drained (so 'close' is ordered after queued messages) and calls MessagePort::dispatchPeerClosed(), which fires the one-shot 'close' event and closes that end, releasing the listener's event-loop ref. The notification is deliberately driven from MessagePort::close() (explicit close and context teardown) and not from the GC-driven ~MessagePort path, which reaches MessagePortPipe::close() directly: a still-referenced peer must not be disturbed when an unreferenced sibling is collected, otherwise a listening port would exit as soon as its idle peer was swept. Fixes #32563.
|
Update on #32563: I ended up fixing it here after all. CI showed the two issues are inseparable. Keeping a port alive on a message listener (the #32562 fix) made two existing node-compat tests hang ( This PR now implements it: |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/web/workers/message-channel.test.ts`:
- Around line 442-447: The test "onmessageerror alone does not keep the process
alive" is using the global MessageChannel instead of importing from the
node:worker_threads module, which weakens coverage for the node-worker_threads
path. Fix this by modifying the code string passed to expectExitsOnItsOwn to
import MessageChannel from 'node:worker_threads' at the beginning of the string,
then use that imported MessageChannel instead of relying on the global
MessageChannel when instantiating port1 and port2.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 786d6d82-87f9-4c81-b4f9-297d4f26fef5
📒 Files selected for processing (5)
src/jsc/bindings/webcore/MessagePort.cppsrc/jsc/bindings/webcore/MessagePort.hsrc/jsc/bindings/webcore/MessagePortPipe.cppsrc/jsc/bindings/webcore/MessagePortPipe.htest/js/web/workers/message-channel.test.ts
Matches the sibling unref()/off() exit cases so the whole node-worker_threads exit group goes through the same import path.
Windows x64-baseline failed only on bun-install.test.ts network flake (GitHub API 504s, git clone timeouts); unrelated to this MessagePort change.
The peer-close notification added earlier only covered an explicit close of
the direct peer. Two paths to a dead peer still pinned a listening port's
event-loop ref forever (a hang that exited pre-PR and in Node):
- close-then-listen: port1.close(); port2.on('message', cb). At close time
port2 had not started, so notifyPeerClosed bailed and never recorded the
close; the later listener then ref'd the loop with no release.
- transferred-then-dropped: a port transferred inside a message that is
dropped (its holder closed) never told its listening sibling.
Rework the notification to be complete:
- MessagePortPipe::close() takes a notifyPeers flag and, when set, wakes the
peer of every side it tears down (the closed side plus in-transit ports
harvested from the dropped inbox). MessagePort::close() passes true; the
GC-driven ~MessagePort and dropped-TransferredMessagePort paths pass false
so collecting an idle port never disturbs a still-referenced peer.
- notifyPeerClosed() records the PeerClosed bit even when the peer has not
started listening yet; attach() honors it and schedules the drain, so a
listener added after the peer closed still observes 'close' and releases.
Known gap: the 'close' event still fires only on the peer, not on the port
that called close() (Node fires it, asynchronously, on both ends). That has
no effect on process lifetime and is left for a follow-up.
|
CI status: the diff is green; the red is CI-infrastructure flake in test-runner meta-tests, unrelated to this change. This PR touches only The red on build 63829 (HEAD ab6dfe5) is I've used my single CI re-roll already, so I'm not pushing another retrigger. The fix is verified locally: #32562 stays alive, #32563 fires |
The notifyPeers flag is threaded through the whole close() worklist, so the GC-driven ~MessagePort path (notifyPeers=false) also suppressed the peer notification for in-transit transferred ports harvested from the dropped inbox. A port transferred into a message whose holder is then collected (not explicitly closed) never told its listening sibling, so the sibling pinned the event loop forever. notifyPeers must gate only the top-level side's direct peer (collecting an idle port must not disturb a still-referenced peer). A harvested in-transit port is undeliverable once its carrier is dropped, regardless of how the carrier died, so its peer is always notified. Track the notify decision per worklist entry: the seed uses notifyPeers, harvested entries always use true.
hasPendingActivity() gates the JS wrapper's GC reachability, but it didn't know about the new PeerClosed bit. In the window between the peer closing and the queued drain firing the one-shot 'close' event, hasPendingActivity() returned false (inbox empty, peer closed), so a GC could sweep the wrapper and the 'close' listener's weak refs, silently dropping the event (and tripping a debug ASSERT in ensureJSFunction). Include PeerClosed in hasPendingActivity() alongside the queued-message check. The bit is cleared in the same store that dispatches 'close', so the wrapper becomes collectable again immediately after, mirroring how a queued message keeps the wrapper alive only until it is delivered.
A port transferred to an already-closed destination is dropped directly by ~TransferredMessagePort (outside the close() worklist), and its listening sibling is not woken. Actively waking it (notifyPeers=true here) fixes the resulting event-loop-ref hang, but schedules a 'close' drain that shifts GC finalization timing for the dropped endpoint enough to break the existing "port dropped in transit does not pin its peer" leak test (heapStats confirms the wrappers are still collected, so it is a timing artifact, not a real leak). Keep notifyPeers=false and record the niche case as a known limitation rather than reshape the finalization path here.
- ~TransferredMessagePort: inline the rationale for notifyPeers=false instead of pointing at "the PR" (comments must stand on their own, not reference transcript/PR context). - close() header: the notifyPeers doc drifted after the worklist change. It governs only the direct peer of `side`; harvested in-transit ports always notify their peers regardless of the flag. Match the .cpp comment.
Fixes #32562
Fixes #32563
Repro
Node prints
received { foo: 'bar' }and keeps the process alive. Bun delivered the message and then exited immediately..onmessage = fnworked; only.on('message')/addEventListener('message')exited early, which made the asymmetry the tell.Cause
Two halves of Node's MessagePort event-loop behavior were missing for local
MessageChannelports:Listener never ref'd the loop. A
MessagePortthat is one end of a localMessageChannelis built directly bygenerateMessagePorts()and never goes throughMessagePort::entangle(), whereonDidChangeListener(the hook that refs/unrefs the event loop on message-listener add/remove) was assigned. SoaddEventListener('message')set the listener but nothing pinned the loop..onmessageworked only because its setter took a separatejsRef().Peer close never released the loop. Node closes both ends of a channel when either side closes, firing
'close'on the peer so it stops holding the loop. Bun never notified the peer. Once the listener ref above existed, a port whose peer was explicitly closed would pin the loop forever (hangingtest-worker-message-port-receive-messageand-transfer-duplicate), and issue Bun exits even ifMessageChannelfromnode:worker_threadsis still working and has unfinished tasks #32563's repro printed the message but never'close'.Fix
onDidChangeListenerfor every port in theMessagePortconstructor, not just transferred ports. Fold the listener-count ref and explicitref()/unref()into onem_hasRefflag plus a self-ref, mirroring Node's handle ref: the firstmessagelistener refs, the last one unrefs,ref()/unref()override, and removing the last listener releases regardless ofref(). The.onmessage/.onmessageerrorsetters no longer ref directly (amessageerrorlistener alone no longer holds the loop open, matching Node).MessagePort::close()now notifies the peer viaMessagePortPipe::notifyPeerClosed(), which sets aPeerClosedbit and wakes the peer's context through the existing drain pipeline.drainAndDispatch()consumes the bit once the inbox is drained (so'close'lands after queued messages) and callsMessagePort::dispatchPeerClosed(), which fires the one-shot'close'event and closes that end. The notification runs only on explicitclose()/ context teardown, never on the GC-driven~MessagePortpath, so a still-referenced peer is not disturbed when an unreferenced sibling is collected.Verification
Matches Node v26.3.0 across the matrix:
.on('message')addEventListener('message').onmessage = fn.on('message')+.unref().on('message')then.off(...).once('message').ref()with no listener.unref()then.on('message').onmessageerroraloneport1.close()closefires, exitsclose)closefires, exitsTests added to
test/js/web/workers/message-channel.test.tscover the listener-attachment paths,ref()/unref()and listener-removal parity, and the peer-close'close'delivery. They fail on the current build and pass with this change. Existing MessagePort lifetime/leak/transfer suites (message-port-pipe,message-port-closed-leak,message-port-context-destroy-leak) and the nodetest-worker-message-port-*parallel tests pass, confirming the added self-ref and peer-close notification don't leak or disturb transfers.Known limitation
The
closeevent is delivered to the peer of a closed port but not to the port that calledclose()itself. Node fires it (asynchronously) on both ends. This has no effect on process lifetime (the closing port releases its event-loop ref viaclose()regardless) and #32563 is the peer-listens case, which is covered. Dispatching the closing port's owncloseevent needs a deferred dispatch before teardown and is left for a follow-up.Also pre-existing (not a regression, and niche): calling
port.ref()on a port that never had a message listener and then closing the peer keeps the loop alive, because a bareref()takes the event-loop ref without starting the port, so the peer-close notification is never consumed. Matching Node here (itsref()is a handle op that does not start message dispatch) without consuming queued messages is a separate change, tracked as a follow-up.Also niche and deferred: transferring a port to an already-closed destination drops the transferred port directly (via
~TransferredMessagePort, outside theclose()worklist), and its listening sibling is not woken, so that sibling keeps the loop alive. Waking it fixes the hang but actively closes the dropped endpoint in a way that shifts GC finalization timing and breaks an existing leak test (heapStats confirms no real leak), so it is left as a follow-up. The common transferred-drop path (a port dropped from a live port inbox on close) is handled.