Skip to content

sql(postgres,mysql): guard ref_and_close against re-entrant TLS close dispatch#32573

Open
robobun wants to merge 5 commits into
mainfrom
farm/ab09136e/postgres-tls-close-reentry
Open

sql(postgres,mysql): guard ref_and_close against re-entrant TLS close dispatch#32573
robobun wants to merge 5 commits into
mainfrom
farm/ab09136e/postgres-tls-close-reentry

Conversation

@robobun

@robobun robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Crash

Sentry BUN-3KKD / BUN-3GMS (29 events, macOS, bun 1.3.14): segfault at address 0x0 in us_internal_ssl_close with two nested refAndClose frames on the stack:

runtime.timer.Timer.All.drainTimers
event_loop.EventLoopTimer.fire
sql_jsc.postgres.PostgresSQLConnection.failWithJSValue  (PostgresSQLConnection.zig:327)
sql_jsc.postgres.PostgresSQLConnection.refAndClose      (PostgresSQLConnection.zig:962)
uws_sys.socket.AnySocket.close                          (socket.zig:570)
sql_jsc.postgres.PostgresSQLConnection.failWithJSValue  (PostgresSQLConnection.zig:327)
sql_jsc.postgres.PostgresSQLConnection.refAndClose      (PostgresSQLConnection.zig:962)
uws_sys.socket.AnySocket.close                          (socket.zig:570)
packages/bun-usockets/src/crypto/openssl.c:857 us_internal_ssl_close

Cause

ref_and_close calls socket.close() on a TLS socket. us_internal_ssl_close synchronously dispatches on_handshake (when the handshake never completed) and then on_close. Both re-enter fail_with_js_value. That function's re-entry guard is if status == Failed return, but when the outer entry was disconnect() the status is Disconnected, so the guard does not trip and a nested ref_and_close runs on the same us_socket_t.

The on_handshake dispatch happens before us_internal_socket_close_raw sets is_closed = 1, so the nested is_closed() check is not sufficient to prevent a second close().

Traced with debug logging on Linux (the nested ref_and_close is reached; on Linux the inner is_closed() happens to be true by the time it runs so the second close is skipped):

refAndClose: is_closed=false status=Disconnected   <- outer, from disconnect()
failed: Connection closed: ConnectionClosed        <- on_close fired synchronously
refAndClose: is_closed=true status=Failed          <- inner, from fail_with_js_value

self.socket is also never detached after close, so once us_internal_free_closed_sockets frees the us_socket_t at the end of the loop iteration the stored handle dangles.

Fix

  • ConnectionFlags::CLOSE_INITIATED: set once before the first socket.close(), checked on every ref_and_close entry. Monotonic; never cleared. Combined with the existing is_closed() check this makes the close path idempotent regardless of which callback re-enters.
  • SocketHandler::on_close detaches self.socket so the stored handle never outlives the us_socket_t. The hasPendingActivity invariant for deferred TLS closes is preserved: self.socket stays Connected(ptr) until on_close actually fires, so a graceful-close connection still reports pending activity until the peer's close_notify arrives.
  • MySQLConnection::close had no is_closed() guard at all and the scopeguard::defer! in fail_with_js_value runs it on every exit path including the early return; it now detaches the stored handle first, closes through a local copy, and skips when CLOSE_INITIATED is already set.

Tests

test/js/sql/sql-postgres-tls-close-reentry.test.ts exercises both re-entry shapes (disconnect on a Connected TLS connection; connection timeout during a stalled TLS handshake) against a Docker-free mock Postgres-over-TLS server, 30/10 iterations under ASAN, and verifies onclose fires exactly once and the wrappers are collectable afterwards.

Gate note: the double-close only reaches us_internal_ssl_close a second time when the on_handshake dispatch runs with status == Disconnected and ssl_handshake_state == PENDING simultaneously. On Linux/epoll those never overlap (by the time the connection is Connected the handshake state is COMPLETED, and close_raw flips is_closed before dispatching on_close); all 29 Sentry events are macOS/kqueue. A 30-iteration probe of both paths under ASAN on the unfixed build did not fire. The regression test is coverage for the re-entry path rather than a deterministic fail-before.

Existing coverage (all green with the change):

  • test/js/sql/sql-close-pending-connection.test.ts
  • test/js/sql/postgres-tls-ctx-leak.test.ts
  • test/js/sql/sql-connect-error-reporting.test.ts
  • test/js/sql/sql-onconnect-onclose-throw.test.ts
  • test/js/sql/sql-mysql-tls-plaintext-injection.test.ts
  • test/js/sql/sql-mysql-clean-reentry.test.ts

… dispatch

Closing a TLS socket from ref_and_close synchronously dispatches the
on_handshake callback (when the handshake never completed) and then
on_close. Both re-enter fail_with_js_value. When the outer entry was
disconnect() the status is Disconnected, not Failed, so the
status == Failed guard in fail_with_js_value does not trip and a nested
ref_and_close runs on the same us_socket_t. The on_handshake dispatch
happens before the C layer flips is_closed, so is_closed() alone does
not prevent the second close.

Add a monotonic CLOSE_INITIATED flag that ref_and_close sets before the
first socket.close() and checks on every entry. Detach the stored socket
handle in the socket-level on_close so nothing can read through it after
the us_socket_t is freed at the end of the loop iteration. Apply the
same copy-and-detach pattern to MySQLConnection::close, which had no
is_closed() guard at all.

Sentry BUN-3KKD / BUN-3GMS (29 events, macOS, 1.3.14): segfault at 0x0 in
us_internal_ssl_close with two nested refAndClose frames on the stack.
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e85099c8-afc0-4822-852d-8c713f9067c9

📥 Commits

Reviewing files that changed from the base of the PR and between 35c0f4e and 0e19430.

📒 Files selected for processing (2)
  • src/sql_jsc/mysql/MySQLConnection.rs
  • test/js/sql/sql-postgres-tls-close-reentry.test.ts

Walkthrough

Adds a ConnectionFlags::CLOSE_INITIATED bitflag and uses it to guard socket.close() in both PostgresSQLConnection::ref_and_close and MySQLConnection::close. In the Postgres path, SocketHandler::on_close also replaces the stored socket with a detached TCP handle before invoking on_close. Two new regression tests cover TLS re-entrancy under connected and mid-handshake close scenarios.

Changes

SQL TLS close re-entrancy guard

Layer / File(s) Summary
CLOSE_INITIATED flag definition
src/sql/shared/ConnectionFlags.rs
Adds CLOSE_INITIATED = 1 << 5 to the ConnectionFlags bitset with documentation on its role guarding re-entrant close paths.
Postgres socket detach and ref_and_close guard
src/sql_jsc/postgres/PostgresSQLConnection.rs
SocketHandler::on_close replaces the stored socket with a detached SocketTcp before calling on_close to avoid accessing a freed us_socket_t. ref_and_close gates socket.close(Normal) behind CLOSE_INITIATED and is-closed checks, sets the flag, and refs the event loop before closing.
MySQL close() re-entrancy guard
src/sql_jsc/mysql/MySQLConnection.rs
close() detaches self.socket into a local variable and only calls socket.close(Normal) when the detached socket is not already closed and CLOSE_INITIATED is not set.
Postgres TLS re-entrancy regression tests
test/js/sql/sql-postgres-tls-close-reentry.test.ts
Two tests using a mocked TLS Postgres server verify that close() after a connected session triggers onclose exactly once and that a stalled handshake produces ERR_POSTGRES_CONNECTION_TIMEOUT. Both assert GC cleanup of PostgresSQLConnection objects.

Possibly related PRs

  • oven-sh/bun#32027: Modifies PostgresSQLConnection handshake and connection-close callback handling; this PR's CLOSE_INITIATED flag and socket replacement extend the same shutdown path to handle TLS re-entrancy.

Suggested reviewers

  • alii
  • cirospaciari
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding re-entrancy guards to the close path in PostgreSQL and MySQL SQL connections using the CLOSE_INITIATED flag.
Description check ✅ Passed The description comprehensively covers what the PR does (the crash, cause, and fix) and how it was verified (regression tests and existing test coverage), following the template sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 11:55 AM PT - Jun 21st, 2026

@robobun, your commit c321423 has 1 failures in Build #63811 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32573

That installs a local version of the PR into your bun-32573 executable, so you can run:

bun-32573 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Bun crashes with segmentation fault when network closes unexpectedly (Knex + PostgreSQL via Neon) #20104 - Segfault on macOS when PostgreSQL network closes unexpectedly; stack trace shows the exact re-entrant ssl_wrapped_context_on_closeus_socket_close double-close path that this PR guards with CLOSE_INITIATED

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #20104

🤖 Generated with Claude Code

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

#20104 is a similar-looking TLS double-close on macOS but goes through the pg npm driver → node:tls → deps.uws.NewSocketHandler (Bun.connect), not bun:sql's native PostgresSQLConnection. This PR only touches src/sql_jsc/, so it does not address that path; leaving #20104 open.

Comment thread test/js/sql/sql-postgres-tls-close-reentry.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/sql/sql-postgres-tls-close-reentry.test.ts`:
- Around line 48-50: The listen() function currently only handles the successful
case where the server starts listening, but does not handle errors emitted
during server startup. If server.listen() emits an error event, the Promise
never settles, causing the test to hang indefinitely until timeout. Wire the
Promise to reject when the server emits an error event, or refactor to use await
once(server, "listening") which automatically handles both success and failure
cases for server startup, ensuring errors are properly propagated instead of
hanging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb23b864-0633-45b7-8b20-b62c94d48eac

📥 Commits

Reviewing files that changed from the base of the PR and between 602d956 and 35c0f4e.

📒 Files selected for processing (4)
  • src/sql/shared/ConnectionFlags.rs
  • src/sql_jsc/mysql/MySQLConnection.rs
  • src/sql_jsc/postgres/PostgresSQLConnection.rs
  • test/js/sql/sql-postgres-tls-close-reentry.test.ts

Comment thread test/js/sql/sql-postgres-tls-close-reentry.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any issues, but this touches native TLS socket-lifecycle and re-entrancy invariants across two drivers, so it's worth a human look.

Extended reasoning...

Overview

This PR adds a ConnectionFlags::CLOSE_INITIATED bitflag and uses it to make the socket-close path idempotent in both PostgresSQLConnection::ref_and_close and MySQLConnection::close, plus detaches the stored us_socket_t handle in SocketHandler::on_close so it can't dangle after the C layer frees it. A new mock-server regression test exercises both re-entry shapes (disconnect-after-Connected and timeout-during-pending-handshake).

Security risks

None introduced. The change is purely defensive — it narrows the set of states in which socket.close() can be called and clears a stored raw pointer earlier. No new inputs, parsing, auth, or trust boundaries.

Level of scrutiny

High. This is native Rust touching raw us_socket_t* lifetime, synchronous TLS callback re-entrancy, and the hasPendingActivity GC-keepalive invariant across two database drivers. The fix is small and the reasoning in the description is thorough, but the correctness depends on subtle ordering between us_internal_ssl_close's on_handshake/on_close dispatch and the is_closed flip in the C layer — and the author notes the regression test does not deterministically reproduce the crash on Linux CI (all 29 Sentry events were macOS/kqueue). A reviewer familiar with the uSockets close path should confirm the detach-in-on_close doesn't break any other reader of self.socket between close-initiation and on_close firing.

Other factors

  • My earlier nit (per-test timeouts) and CodeRabbit's listen() error-handling suggestion were both addressed; all inline comments are resolved.
  • CI on the latest commit shows musl build failures and an unrelated-looking S3 test SIGABRT — likely infra/flake, but worth confirming before merge.
  • The MySQL close() change also fixes a pre-existing missing is_closed() guard, which is a behavior change beyond the immediate crash fix.

@robobun

robobun commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

CI status: the diff is green. Both CI runs failed on unrelated tests my change does not touch:

  • build 63807: test/js/bun/s3/s3-stream-cancel-leak.test.ts SIGABRT in ReadableStream::from_blob_copy_ref on debian 13 x64-asan
  • build 63811: test/js/bun/terminal/terminal.test.ts pty-attach timeout on darwin 14 x64

Neither failure mentions sql_jsc, PostgresSQLConnection, MySQLConnection, or the new sql-postgres-tls-close-reentry.test.ts (which passed on every lane in both runs). Two different unrelated tests on two different lanes across two runs indicates flake. Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant