.pr_agent_accepted_suggestions

PR 7794 (2026-05-17)

[reliability] Brittle path stripping

Brittle path stripping

`seen` is derived by stripping `${srcRoot}${sep}` via `startsWith()` and then replacing only the current platform separator, so if Mocha prints paths with different separators or drive-letter casing on Windows, the prefix will not be removed and the test will fail with absolute paths in `seen`. This undermines the PR’s goal of making the assertions stable across Linux/Windows runners.

Issue description

The test normalizes Mocha’s --list-files output by doing a string startsWith(prefix) check and then replacing only path.sep. This can fail on Windows if the output uses different separators (e.g., /) or if drive-letter casing differs, leaving absolute paths that break the toContain() assertions.

Issue Context

The logic currently does:

build prefix = ${srcRoot}${sep}`
strip prefix with startsWith(prefix)
convert separators with split(sep).join('/')

Fix Focus Areas

src/tests/backend-new/specs/backend-tests-glob.test.ts[51-56]

Suggested implementation direction

Normalize each line with path.normalize(l).
Convert to repo-relative using path.relative(srcRoot, normalized) when the path is absolute.
Finally convert to POSIX separators (e.g., rel.split(path.sep).join('/')). This avoids depending on exact string prefix/casing/separator matches.

PR 7793 (2026-05-17)

[maintainability] `listAuthorsOfPad` filtering undocumented

`listAuthorsOfPad` filtering undocumented

`listAuthorsOfPad` now filters out the synthetic `Pad.SYSTEM_AUTHOR_ID` from the returned `authorIDs`, changing public HTTP API behavior. The HTTP API documentation under `doc/api/http_api.*` is not updated to reflect this, risking client confusion and incorrect integrations.

Issue description

listAuthorsOfPad behavior changed to exclude the synthetic Pad.SYSTEM_AUTHOR_ID (a.etherpad-system), but the public HTTP API documentation still states it returns authors who contributed to the pad without noting this exclusion.

Issue Context

This PR intentionally hides the system author from the public API surface by filtering it out in src/node/db/API.ts. Documentation in doc/ should be updated in the same PR to match the new behavior.

Fix Focus Areas

doc/api/http_api.md[696-704]
doc/api/http_api.adoc[652-661]

PR 7792 (2026-05-17)

[reliability] Non-`apierror` export errors unhandled

Non-`apierror` export errors unhandled

The export route only converts `apierror` exceptions into a deterministic plain-text response; all other export errors are passed to `next()` and can still fall back to Express's default HTML error page. This can reintroduce non-deterministic HTML error bodies for other export failures.

Issue description

The export route only special-cases err.name === 'apierror' and calls next(err) for everything else, which can still trigger Express's default HTML error renderer.

Issue Context

Compliance requires explicit error handling for export-route failures so clients/tests get deterministic, non-HTML error bodies (at minimum containing err.message) rather than Express's default HTML page.

Fix Focus Areas

src/node/hooks/express/importexport.ts[74-82]

[correctness] Error downloads as attachment

Error downloads as attachment

If `checkValidRev()` throws, `ExportHandler.doExport()` has already set `Content-Disposition: attachment`, so the new plain-text 500 body can still be delivered as a downloadable file rather than a visible error. This makes the surfaced error message harder to notice and can confuse callers about whether an export succeeded.

Issue description

ExportHandler.doExport() calls res.attachment(...) before validating req.params.rev. If checkValidRev() throws, the new apierror catch returns a plain-text 500 but keeps Content-Disposition: attachment, so clients (especially browsers) may download the error body as an export file.

Issue Context

res.attachment() is invoked before checkValidRev().
The route-level catch sends the error message but does not clear Content-Disposition.

Fix Focus Areas

src/node/hooks/express/importexport.ts[74-82]
src/node/handler/ExportHandler.ts[58-65]

Implementation notes

Prefer one of:

Move rev validation (checkValidRev) before res.attachment(...) in ExportHandler.doExport() so invalid :rev never sets download headers.
Or, in the apierror catch branch, call res.removeHeader('Content-Disposition') (and optionally clear any previously-set export Content-Type) before sending the plain-text 500.

PR 7777 (2026-05-16)

[correctness] History toolbar i18n regression

History toolbar i18n regression

#history-controls now uses aria-labelledby, which overrides the aria-label that pad_mode.ts sets from the already-translated pad.historyMode.controlsLabel key. Because the new pad.editor.toolbar.history key currently exists only in en.json, non-English locales will likely announce the English fallback instead of the existing localized label.

Issue description

#history-controls previously got a localized accessible name via pad_mode.ts setting aria-label from pad.historyMode.controlsLabel. This PR adds aria-labelledby="editbar-history-label", which takes precedence over aria-label and points to a new data-l10n-id="pad.editor.toolbar.history" span. In the current repo state, pad.editor.toolbar.history exists only in en.json, while pad.historyMode.controlsLabel is already translated in many locales. This means non-English locales will regress from localized naming to an English fallback.

Issue Context

aria-labelledby overrides aria-label in accessible name computation.
The repo already contains translations for pad.historyMode.controlsLabel in multiple locale files.
The new history-toolbar key is currently only present in English.

Fix Focus Areas

src/templates/pad.html[87-94]
src/templates/pad.html[116-117]
src/static/js/pad_mode.ts[97-120]
src/locales/en.json[365-368]

Recommended fix

Prefer reusing the already-translated history controls label key:

Change #editbar-history-label to use data-l10n-id="pad.historyMode.controlsLabel" (and update its fallback text accordingly, e.g. "Pad history controls").
Keep aria-labelledby="editbar-history-label" on #history-controls.
Update the Playwright expectation for the history toolbar label to match the new (reused) string.

Alternative fix

If you need the specific new phrasing, add pad.editor.toolbar.history to all locale JSON files (or ensure your translation import pipeline will populate them at the same time as this change).

PR 7773 (2026-05-16)

[correctness] Wrong author in roSocket

Wrong author in roSocket

The USER_CHANGES test helper always builds the apool using `authorId` from the first socket’s CLIENT_VARS, but `roSocket` can have a different author identity because `common.connect()` only forwards cookies from the *current* HTTP response and `/p/:pad` does not re-issue the author-token cookie if it already exists. With the new server validation requiring an `author` attribute on '+' ops, `sendUserChanges(roSocket, ...)` can now be rejected as “changes as another author”, breaking the `permitOnce` test.

Issue description

src/tests/backend/specs/messages.ts defaults the USER_CHANGES apool to the main socket’s authorId for all sockets. But roSocket can end up with a different authorId (no token cookie forwarded on its socket.io connection), so sendUserChanges(roSocket, '...*0+...') may be rejected by the new server-side validation.

Issue Context

authorId is captured only from the first socket’s CLIENT_VARS; the roSocket CLIENT_VARS is ignored.
common.connect(res) forwards only cookies present in the provided HTTP response’s set-cookie headers.
/p/:pad uses ensureAuthorTokenCookie(), which does not set a Set-Cookie header if the request already has a valid token cookie—so the second GET (to the read-only URL) often won’t provide the token cookie to common.connect().

Fix Focus Areas

src/tests/backend/specs/messages.ts[26-46]
src/tests/backend/specs/messages.ts[171-179]
src/tests/backend/specs/messages.ts[255-270]

Suggested fix

Capture roAuthorId from common.handshake(roSocket, roPadId) (read CLIENT_VARS).
Change authorPool() to accept an explicit authorId (or create two pools: one for socket, one for roSocket).
Ensure sendUserChanges(roSocket, ...) uses roAuthorId’s pool (or explicitly pass an apool argument in the roSocket test).

PR 7771 (2026-05-16)

[reliability] Restart triggered on failure

Restart triggered on failure

installer.install() now calls its wrapped callback with an error, which decrements the global task counter and triggers onAllTasksFinished() even for failures, causing hooks.aCallAll('restartServer') to run. This can restart Etherpad (disconnecting users) on any failed install attempt, including the intentional EngineIncompatibleError short-circuit where nothing was installed.

Issue description

wrapTaskCb() triggers onAllTasksFinished() whenever the task counter reaches zero, regardless of whether the task completed successfully. With this PR, install() now invokes cb(err) on failures, so failed installs (including preflight incompatibility) can cause a full server restart even though nothing changed.

Issue Context

wrapTaskCb() decrements tasks after invoking the callback and calls onAllTasksFinished() when tasks === 0.
onAllTasksFinished() calls hooks.aCallAll('restartServer'), which closes and recreates the HTTP(S) server.
After this PR, install() calls cb(err) in the catch block, so failures now complete the task and can trigger the restart path.

Fix Focus Areas

src/static/js/pluginfw/installer.ts[34-58]
src/static/js/pluginfw/installer.ts[182-199]

Suggested approach

Track whether a task actually made changes (e.g., needsRestart / didMutate flag set only after successful install/uninstall steps), and only call onAllTasksFinished() when tasks === 0 && needsRestart.
For install(), set needsRestart = true only after linkInstaller.installPlugin() and hooks.aCallAll('pluginInstall', ...) succeed.
For the engine-incompatibility short-circuit (and other errors), ensure the task counter still decrements, but do not restart.
Alternatively, change the wrapper to pass through error state and skip onAllTasksFinished() if the first callback argument is truthy, while still decrementing tasks in a finally-like manner.

[reliability] Registry fetch can hang

Registry fetch can hang

fetchPluginEnginesNode() awaits a registry.npmjs.org fetch with no timeout/abort, and install() awaits it before proceeding. A stalled DNS/network connection can therefore hang the entire install path and prevent the finished:install socket event from ever being emitted.

Issue description

fetchPluginEnginesNode() performs a network fetch without any timeout or abort mechanism. If the request stalls (DNS, proxy, captive portal, registry hang), install() will await forever and the admin UI will never receive finished:install.

Issue Context

The comment says the lookup is best-effort and should fall through on failure, but an unbounded await is not a failure and can block indefinitely.

Fix Focus Areas

src/static/js/pluginfw/installer.ts[164-190]

Suggested approach

Use an AbortController with a short timeout (e.g., 3–10 seconds):
Create controller, setTimeout(() => controller.abort(), timeoutMs)
Pass {headers, signal: controller.signal} to fetch()
In finally, clear the timeout
On abort (or any error), return undefined so install continues down the existing path.
Optionally log at debug level when the preflight times out to aid diagnosis without spamming logs.

PR 7762 (2026-05-15)

[maintainability] New Prometheus metrics lack flag

New Prometheus metrics lack flag

The PR introduces new Prometheus metrics (`etherpad_pad_users`, `etherpad_changeset_apply_duration_seconds`, `etherpad_socket_emits_total`) that are registered and emitted unconditionally. This violates the requirement that new features be gated behind a feature flag and disabled by default.

Issue description

New Prometheus metrics are always enabled, but compliance requires new features to be behind a feature flag and disabled by default.

Issue Context

This PR registers three new metrics and emits them on the hot path and in the Prometheus monitor loop without any enable/disable mechanism.

Fix Focus Areas

src/node/prometheus.ts[26-48]
src/node/handler/PadMessageHandler.ts[50-50]
src/node/handler/PadMessageHandler.ts[609-609]
src/node/handler/PadMessageHandler.ts[630-630]
src/node/handler/PadMessageHandler.ts[671-671]
src/node/handler/PadMessageHandler.ts[791-791]
src/node/handler/PadMessageHandler.ts[904-904]
src/node/handler/PadMessageHandler.ts[957-957]

[correctness] Histogram includes fan-out

Histogram includes fan-out

handleUserChanges() starts etherpad_changeset_apply_duration_seconds before any processing and only stops it in finally, after updatePadClients() completes, so the histogram includes broadcast/fan-out time rather than isolating the apply path. This prevents the scaling-dive analysis from distinguishing “apply is slow” vs “fan-out is slow”.

Issue description

etherpad_changeset_apply_duration_seconds is intended to time the apply path, but the timer currently spans the entire handleUserChanges() including await exports.updatePadClients(pad) (fan-out). This makes the metric misleading and defeats the PR’s stated purpose.

Issue Context

The timer is started near the beginning of handleUserChanges() and stopped in finally.
updatePadClients() is awaited before the timer is stopped, so its work is included in the duration.

Fix Focus Areas

src/node/handler/PadMessageHandler.ts[788-905]

Implementation notes

Move recordChangesetApply() start to immediately before the “apply” work you want to measure (e.g., just before pad.appendRevision(...)).
Call the returned stopHistogram() immediately after the apply work completes and before any socket emits / updatePadClients() fan-out.
If you also want total end-to-end latency, keep using the existing stats.timer('edits') (or add a second histogram explicitly for fan-out/total).

[reliability] Unbounded metrics label

Unbounded metrics label

handleCustomMessage() increments etherpad_socket_emits_total with msgString (an HTTP API parameter) as the label value, allowing an API caller to generate unbounded distinct label values and potentially exhaust memory/CPU via high-cardinality time series. Although the endpoint is API-key protected, this still creates a sharp footgun if the key is leaked or a client misbehaves.

Issue description

etherpad_socket_emits_total{type=...} uses the message type as a Prometheus label. In handleCustomMessage(), that type comes directly from the HTTP API msg parameter, which means label cardinality is unbounded.

Issue Context

sendClientsMessage(padID, msg) passes msg directly into PadMessageHandler.handleCustomMessage(padID, msg).
handleCustomMessage() sets data.type = msgString and then calls recordSocketEmit(msg.data.type).
recordSocketEmit() uses socketEmitsTotal.labels(type).inc(), creating a new time series per distinct type.

Fix Focus Areas

src/node/handler/PadMessageHandler.ts[620-631]
src/node/db/API.ts[863-866]
src/node/prom-instruments.ts[17-21]
src/node/prom-instruments.ts[34-36]

Implementation notes

Pick one of:

Normalize/bucket: Map any unknown/untrusted type to a small bounded set (e.g., API_CUSTOM_MESSAGE, CUSTOM, other, unknown).
Allowlist: Only allow a fixed set of label values (e.g., NEW_CHANGES, CHAT_MESSAGE, etc.); everything else becomes other.
Remove label at this site: For handleCustomMessage(), increment with a constant label regardless of msgString. Also consider enforcing a max length/character set if you keep any user-provided label value.

PR 7757 (2026-05-15)

[correctness] Promise sent to admin

Promise sent to admin

In the admin plugin socket handler, `checkUpdates` emits the unresolved Promise returned by `checkPluginForUpdates()` because it is not awaited, so the admin UI will throw when it calls `.includes()` on `data.updatable`. Additionally, the error path emits `{}` for `updatable`, which also violates the UI's `string[]` expectation and can throw on transient errors.

Issue description

The checkUpdates socket event emits an unresolved Promise (missing await) and emits the wrong type ({}) on error. This breaks the admin plugins page because the client expects updatable to be a string[] and calls .includes().

Issue Context

checkPluginForUpdates is async.
Admin UI (HomePage.tsx) assumes updatable: string[].

Fix Focus Areas

src/node/hooks/express/adminplugins.ts[74-82]
admin/src/pages/HomePage.tsx[66-70]

PR 7755 (2026-05-15)

[security] Unsafe cookie decoding

Unsafe cookie decoding

PadMessageHandler.handleMessage() decodes untrusted handshake Cookie values with decodeURIComponent() without handling URIError, so a malformed cookie (e.g. sessionID=%ZZ) will throw and abort CLIENT_READY processing for that socket. This enables repeated connection attempts to spam server error logs and prevent clients from joining pads.

Issue description

readCookie() uses decodeURIComponent() on raw Cookie header values without error handling. If the cookie value contains malformed percent-encoding, decodeURIComponent() throws a URIError, causing CLIENT_READY to fail and generating server error logs.

Issue Context

Cookie headers are client-controlled; malicious or buggy clients can supply invalid encodings. The error propagates up to the socket message handler, which logs the exception.

Fix Focus Areas

src/node/handler/PadMessageHandler.ts[406-434]

Suggested fix

Wrap decodeURIComponent(...) in a try/catch and treat decode failures as null (or fall back to the raw string).
Consider rate-limiting or lowering log severity for repeated decode failures to reduce log amplification.
Add a backend regression test that connects with a malformed cookie value (for sessionID and/or token) and asserts CLIENT_READY does not throw and auth.sessionID resolves to null (or expected fallback).

PR 7753 (2026-05-15)

[correctness] Misleading deferred subtitle

Misleading deferred subtitle

`UpdatePage` renders the "Outside maintenance window" subtitle for autonomous tier whenever `scheduledFor` is more than 60 seconds away, which is also true for a normal in-window grace period (e.g., 15 minutes), so the UI can incorrectly claim the delay is due to being outside the window.

Issue description

UpdatePage currently shows the maintenance-window deferral subtitle based on a fixed scheduledFor > now + 60s heuristic. This misfires when the update is scheduled inside the maintenance window but has a grace period longer than 60s (common per docs), leading to incorrect admin messaging.

Issue Context

Backend Tier 4 only snaps scheduledFor to the next window opening if now + grace is outside the window; otherwise it keeps scheduledFor = now + grace. The status endpoint also always provides nextWindowOpensAt for autonomous tier (when the window parses), so the UI needs a stronger signal than “scheduledFor is far away”.

Fix Focus Areas

admin/src/pages/UpdatePage.tsx[186-207]
src/node/updater/Scheduler.ts[79-88]
src/node/hooks/express/updateStatus.ts[115-119]
doc/admin/updates.md[202-208]

Suggested implementation direction

Change the subtitle condition to reflect actual deferral, e.g. only show it when scheduled.scheduledFor is effectively the next window opening (compare equality to us.nextWindowOpensAt, possibly with a small tolerance), rather than using a fixed > now + 60s threshold.

[correctness] Preflight detail dropped

Preflight detail dropped

`applyUpdate()` saves a detailed preflight failure reason (`reasonStr`) to state and logs but returns only `pf.reason`, causing `/admin/update/apply` responses and failure-notification emails to lose the detail that was just computed.

Issue description

On preflight failure, applyUpdate() computes reasonStr (including optional pf.detail) and persists it, but returns {reason: pf.reason}. Callers (HTTP route + notify path) use the returned result.reason, so they miss the detail string.

Issue Context

State/logs store the enriched string, so the admin UI eventually shows it.
Immediate HTTP 409 responses and notifyApplyFailure() emails use the returned value and therefore lose important diagnostics (e.g., Node engine mismatch details).

Fix Focus Areas

src/node/updater/applyPipeline.ts[89-103]
src/node/hooks/express/updateActions.ts[274-301]
src/node/updater/Notifier.ts[156-162]

Suggested implementation direction

Change applyUpdate() to return reason: reasonStr for preflight-failed.
Ensure any downstream callers that surface result.reason (HTTP + email) now get the enriched detail.

[reliability] SMTP cache misses config

SMTP cache misses config

`sendEmailViaSmtp()` only rebuilds the cached nodemailer transport when `settings.mail.host` changes, so runtime changes to `port`/`secure`/`auth` after `reloadSettings()` will keep using a stale transport configuration.

Issue description

The cached nodemailer transport is invalidated only on host changes. If an operator updates SMTP credentials or port (without changing host) and triggers reloadSettings(), future emails will continue using the old transport settings.

Issue Context

The admin settings flow can call reloadSettings() at runtime, so mail settings are expected to be mutable without a process restart.

Fix Focus Areas

src/node/updater/index.ts[46-78]
src/node/hooks/express/adminsettings.ts[395-401]

Suggested implementation direction

Expand the cache key to include host, port, secure, and a stable representation of auth (and optionally from).
Alternatively, clear transportCache whenever reloadSettings() runs (if there is a suitable hook point), forcing a rebuild on next send.

PR 7750 (2026-05-15)

[reliability] No test for plugin_packages migration

No test for plugin_packages migration

This PR changes Debian postinstall/systemd behavior for `src/plugin_packages` but does not add/update an automated regression test to validate the new directory layout and upgrade migration. Without a test, future packaging changes could reintroduce the `MODULE_NOT_FOUND` failure or break upgrades unnoticed.

Issue description

The PR fixes a Debian packaging bug by keeping src/plugin_packages in-tree and adjusting permissions/systemd sandboxing, but there is no updated regression test to enforce the new behavior.

Issue Context

packaging/test-local.sh currently asserts the old symlink-based layout for /opt/etherpad/src/plugin_packages, so it will not validate (and will likely fail under) the new intended behavior.

Fix Focus Areas

packaging/test-local.sh[133-150]
packaging/scripts/postinstall.sh[69-103]
packaging/systemd/etherpad.service[43-51]

[correctness] CI symlink assertions stale

CI symlink assertions stale

packaging/scripts/postinstall.sh no longer creates /opt/etherpad/src/plugin_packages as a symlink, but the Debian CI workflow and packaging/test-local.sh still assert it is a symlink to /var/lib/etherpad/plugin_packages, causing CI/test failures.

Issue description

postinstall.sh now migrates away from a symlinked src/plugin_packages and ensures it is a real directory. CI and local packaging tests still check for the old symlink layout, so they will fail.

Issue Context

The deb packaging validation in .github/workflows/deb-package.yml and packaging/test-local.sh must be updated to match the new in-tree directory layout and permissions.

Fix Focus Areas

.github/workflows/deb-package.yml[170-179]
packaging/test-local.sh[133-146]

What to change

Replace test -L /opt/etherpad/src/plugin_packages + readlink assertions with test -d /opt/etherpad/src/plugin_packages.
Replace /var/lib/etherpad/plugin_packages existence/ownership checks with permission/group checks on /opt/etherpad/src/plugin_packages (and optionally /opt/etherpad/src/plugin_packages/.versions if you decide to pre-create it), consistent with postinstall.sh (group=etherpad, mode 2775).

PR 7748 (2026-05-15)

[reliability] Mocha --exit masks handle leaks

Mocha --exit masks handle leaks

Adding `--exit` forces the test process to terminate even if timers/sockets/handles are still open, so backend tests will no longer hang or otherwise surface leaked handles during the post-suite event-loop drain. This weakens regression detection for known classes of leaks (for example, socket.io-client reconnect timers) and can let new leaks slip by silently.

Issue description

mocha --exit improves flake mitigation, but it also prevents the test runner from naturally exposing leaked handles (timers/sockets) via a non-draining event loop.

Issue Context

At least one backend spec explicitly calls out that leaked socket.io-client reconnect timers keep the event loop alive past Mocha’s “passing” output; --exit would make similar future leaks much harder to detect.

Fix Focus Areas

src/package.json[149-152]
src/tests/backend/specs/lowerCasePadIds.ts[20-26]

Suggested fix options

Make --exit conditional (e.g., only in CI, or only for Windows+Node24) via an env var in the workflow and a small script wrapper.
If --exit must remain on, add an explicit leak-detection phase (e.g., fail the run if known handle types remain, or run an open-handle dump right before forced exit) so regressions are still caught.

[observability] Diagnostics exit matrix misleading

Diagnostics exit matrix misleading

Backend tests now run Mocha with `--exit`, so successful runs will typically skip `beforeExit` and always emit only `exit`, but `tests/backend/diagnostics.ts` documents “only exit” as meaning `process.exit()` was called unexpectedly. This makes the diagnostics output/documentation inaccurate and can mislead future debugging of backend-test failures.

Issue description

Mocha is now invoked with --exit, which changes the meaning of the beforeExit/exit event pattern that src/tests/backend/diagnostics.ts documents and uses for interpretation.

Issue Context

With --exit, Mocha calls process.exit(...) at the end of the run. That means beforeExit usually won’t run on success, and “only exit” becomes the expected behavior, contradicting diagnostics.ts’s current matrix.

Fix Focus Areas

src/tests/backend/diagnostics.ts[23-27]
src/tests/backend/diagnostics.ts[66-73]
src/package.json[149-152]

Suggested fix

Update the comment matrix (and optionally the log output) to explicitly account for --exit.
Optionally, have diagnostics.ts detect --exit (e.g., via process.argv.includes('--exit')) and log an extra line like diag('mocha --exit enabled; beforeExit will not fire on success') so the runner log remains self-explanatory.

PR 7727 (2026-05-11)

[reliability] `CMD` change lacks regression test

`CMD` change lacks regression test

The PR changes the container startup path to bypass `pnpm run prod`, but it does not add a regression test that would fail if the old `pnpm`-based `CMD` were restored. Without an automated reproduction (e.g., the `docker-compose.yml` named-volume layout on `src/plugin_packages`), this bug can silently regress.

Issue description

This PR fixes a Docker runtime crash by bypassing pnpm run prod, but it does not include a regression test to ensure the container still boots under the problematic volume layout (notably the named volume mounted to /opt/etherpad-lite/src/plugin_packages).

Issue Context

The changed CMD in Dockerfile is intended to avoid pnpm 11's runDepsStatusCheck behavior at runtime. A regression test should exercise the same startup conditions (ideally including the docker-compose.yml named-volume layout) so reverting the fix would cause CI to fail.

Fix Focus Areas

Dockerfile[217-228]
.github/workflows/docker.yml[66-82]
docker-compose.yml[2-30]

PR 7726 (2026-05-11)

[observability] Missing nginx logs on timeout

Missing nginx logs on timeout

On timeout the step only prints `etherpad-docker` logs, but the polled endpoint is the nginx container on 8081 and that container is started without a name. If nginx is the problem, the workflow won’t capture its logs, slowing triage.

Issue description

The timeout handler dumps only etherpad logs, but failures at http://127.0.0.1:8081/ can be caused by nginx itself (crash, bad config, network). Because nginx is started without --name, it’s awkward to reliably capture its logs during failure.

Issue Context

Nginx is the container bound to host port 8081 and proxies to etherpad.

Fix Focus Areas

.github/workflows/rate-limit.yml[58-84]

Suggested fix

Start nginx with a stable name, e.g. docker run --name nginx-docker -p 8081:80 ... -d nginx-latest.
On readiness timeout, print both:

docker logs nginx-docker || true
docker logs etherpad-docker || true Optionally add docker ps -a for quick container state visibility.

[performance] Wait can reach 4m

Wait can reach 4m

The readiness poll can delay job failure up to ~240s because each of the 60 iterations includes a curl with `--max-time 2` plus `sleep 2`. This unnecessarily prolongs CI runs when etherpad never becomes ready.

Issue description

The readiness loop’s wall-clock upper bound is ~4 minutes (60 * (2s curl timeout + 2s sleep)), which can slow down failure feedback.

Issue Context

This is the new “Wait for etherpad behind nginx to be ready” step.

Fix Focus Areas

.github/workflows/rate-limit.yml[66-84]

Suggested fix

Use a single wall-clock timeout (e.g., timeout 120s bash -c 'until curl ...; do sleep 2; done') or reduce either the per-try curl timeout or the sleep interval so the maximum wait matches your intended bound (for example, 120s total). Also consider printing the final HTTP status/connection error to make timeouts easier to interpret.

PR 7720 (2026-05-11)

[reliability] Timer skips state recheck

Timer skips state recheck

The scheduler timer’s fire path calls applyUpdate() without re-checking persisted execution state or current policy, so an update can start even if the admin cancels at the timer boundary or the policy/tier changes during the grace window. This contradicts the SchedulerRunnerDeps contract comment that triggerApply should re-check state before acting.

Issue description

The scheduler timer fire path (triggerApply) does not verify that:

the persisted state is still scheduled for the same tag, and
policy still allows auto-apply. This can cause updates to start despite last-moment cancellation or a tier/policy flip during the grace window.

Issue Context

SchedulerRunnerDeps.triggerApply is documented to re-check persisted state.
The production wiring in index.ts currently calls applyUpdate() directly.
applyUpdate() validates state shape/tag/lock, but it does not evaluate update policy.

Fix Focus Areas

src/node/updater/Scheduler.ts[96-101]
src/node/updater/index.ts[320-331]
src/node/updater/applyPipeline.ts[48-77]

What to change

In triggerApply (index.ts):

Load the latest persisted state.
Abort unless state.execution.status === 'scheduled' AND state.execution.targetTag === targetTag.
Re-evaluate policy (using current settings + install method + current version) and abort unless policy.canAuto.
Optionally, if aborting due to policy denial, clear scheduled state to idle and persist it so the UI doesn’t show a stale scheduled state. This keeps the timer callback safe under races and matches the documented contract.

[reliability] scheduledFor format unchecked

scheduledFor format unchecked

State validation for the new scheduled execution status only enforces that scheduledFor is a non-empty string, not a valid timestamp, so a hand-edited state file can pass validation but result in NaN delay calculations and effectively immediate timer firing. This is low-likelihood but easy to harden given scheduled status is newly introduced.

Issue description

execution.status === 'scheduled' requires scheduledFor and startedAt, but state validation only checks they are non-empty strings, not valid timestamps.

Issue Context

Scheduler delay uses new Date(scheduledFor).getTime().
Invalid date strings yield NaN, which can lead to unintended immediate scheduling behavior.

Fix Focus Areas

src/node/updater/state.ts[15-39]

What to change

Enhance execution validation for statuses that store timestamps (including scheduled) to verify fields are parseable dates (e.g., !Number.isNaN(Date.parse(value))).
Apply the stricter check only to known timestamp fields (scheduledFor/startedAt/drainEndsAt/etc.) to avoid changing semantics for non-timestamp strings like reason. This is defense-in-depth against corrupted/hand-edited state.

PR 7716 (2026-05-10)

[security] Hardcoded `https://www.npmjs.com`

Hardcoded `https://www.npmjs.com`

The new admin UI link uses a protocol-specific external URL (`https://...`) instead of a protocol-independent URL. This violates the checklist requirement to use protocol-independent URLs where applicable to avoid protocol-coupling issues.

Issue description

A new external link hard-codes https:// (https://www.npmjs.com/...) instead of using a protocol-independent URL as required by the compliance checklist.

Issue Context

This URL is used for the npm search link in the admin plugins page.

Fix Focus Areas

admin/src/pages/HomePage.tsx[138-145]

[reliability] Plugin listeners not cleaned

Plugin listeners not cleaned

HomePage registers multiple Socket.IO handlers inside effects that re-run (e.g., when `searchParams` or the socket instance changes) but only clears an interval on unmount, so listeners are never detached. This allows handlers to fire after unmount and to accumulate over time, causing duplicate state updates and unbounded listener growth.

Issue description

HomePage attaches Socket.IO listeners (e.g., results:installed, results:updatable, results:search, results:searcherror, finished:*, connect) inside React effects, including effects that re-run when searchParams changes, but it does not unregister those listeners in cleanup. Update the component so every registered .on() handler is removed appropriately to prevent duplicate event handling, unbounded listener growth, and handlers firing after unmount.

Issue Context

The current effect cleanup only clears a setInterval, leaving all Socket.IO .on() handlers attached. Because Socket.IO listeners persist until explicitly removed, any effect that re-runs (such as one dependent on searchParams, or one that might re-run if the socket instance changes) will accumulate additional listeners unless off(event, handler) is called, leading to multiple state updates per single server response.

Fix Focus Areas

admin/src/pages/HomePage.tsx[58-89]
admin/src/pages/HomePage.tsx[91-100]

PR 7714 (2026-05-10)

[correctness] REST checkToken path changed

REST checkToken path changed

`checkToken` was moved into the new `server` resource, which changes its REST-style runtime route from `/rest//pad/checkToken` to `/rest//server/checkToken`. This breaks REST-style backward compatibility (clients will get `code: 3` “no such function”) and contradicts the design note that REST URLs for existing operations should remain unchanged via per-op tag overrides.

Issue description

checkToken now routes under /rest/<ver>/server/checkToken, breaking existing REST-style callers that previously used /rest/<ver>/pad/checkToken.

Issue Context

REST-style paths are computed from the resource/action keys (_restPath = /${resource}/${action}) and used by the runtime OpenAPIBackend router for /rest/`.

Fix Focus Areas

src/node/hooks/express/openapi.ts[156-319]

Suggested fix

Move checkToken back under the existing pad resource (restoring /pad/checkToken routing), and set tags: ['server'] on that operation to achieve the desired grouping without changing the REST URL.
Keep the server resource for truly new server-only operations like getStats.
(Optional safety) Add a regression test that /rest/<ver>/pad/checkToken still resolves (similar to the existing /api/<ver>/checkToken assertions).

PR 7710 (2026-05-09)

[security] Hardcoded https Google Fonts

Hardcoded https Google Fonts

External font resources are added using hardcoded `https://` URLs instead of protocol-independent `//` URLs. This can cause mixed-content/compatibility issues in non-HTTPS deployments and violates the project convention.

Issue description

pad.html adds external Google Fonts resources using https:// instead of protocol-independent // URLs.

Issue Context

The compliance checklist requires protocol-independent URLs where appropriate to avoid mixed content issues and improve deployment compatibility.

Fix Focus Areas

src/templates/pad.html[24-26]

[maintainability] Timeslider URL change undocumented

Timeslider URL change undocumented

The behavior of `/p/:pad/timeslider` has changed (302 redirect unless `?embed=1`), but no documentation updates are included. This makes existing documentation about timeslider routes inaccurate and can break adopters who rely on the documented URL.

Issue description

The timeslider endpoint behavior changed: /p/:pad/timeslider now redirects to the pad unless ?embed=1 is provided, but docs still imply the timeslider is directly served at /p/:padid/timeslider.

Issue Context

This is a public, user- and integrator-facing URL change. Documentation should mention the redirect and provide the correct URL for embedding/direct HTML (?embed=1) with an example.

Fix Focus Areas

doc/skins.md[5-10]
src/node/hooks/express/specialpages.ts[402-408]
src/node/hooks/express/specialpages.ts[420-420]

[correctness] Latest hash opens rev0

Latest hash opens rev0

`PadModeController.onOuterHashChange()` maps `#rev/latest` to revision 0 via `rev < 0 ? 0 : rev`, which forces the embedded timeslider to jump to the first revision instead of the latest. Users manually navigating to `#rev/latest` (or any future link that sets it) will see the wrong historical content.

Issue description

#rev/latest is parsed as -1 but then converted to 0 when syncing the outer hash into the embedded timeslider, which incorrectly jumps to revision 0.

Issue Context

The embedded timeslider interprets window.location.hash = "#N" as the desired revision number.

Fix Focus Areas

src/static/js/pad_mode.ts[214-226]
src/static/js/pad_mode.ts[196-201]

Implementation notes

When rev < 0, clear the inner hash (win.location.hash = "") instead of setting #0, or compute the latest revision number from the embedded frame (if available) and set that.
Ensure the outer URL remains canonical (#rev/latest is fine), but don’t mis-target the embedded slider.

[security] External Google Fonts added

External Google Fonts added

`pad.html` now unconditionally loads multiple Google Fonts via `fonts.googleapis.com`/`fonts.gstatic.com`, adding third‑party network requests on every pad view. This can break offline/self-hosted deployments and leaks user IP/metadata to a third party by default.

Issue description

The core pad HTML template now hard-depends on Google Fonts, introducing third-party requests and potential breakage for offline/CSP-restricted installations.

Issue Context

Etherpad is commonly self-hosted; loading external fonts by default is a privacy and reliability regression.

Fix Focus Areas

src/templates/pad.html[16-27]

Implementation notes

Remove the Google Fonts <link> tags entirely, or gate them behind an explicit setting/skin.
If these fonts are required for a theme, bundle/self-host them under src/static/ and reference local URLs instead.

[reliability] Redirect embed bypass too loose

Redirect embed bypass too loose

The server treats any truthy `?embed=` value as an iframe request and skips the redirect, so direct visits can still access `/p/:pad/timeslider` by adding `?embed=0` or similar. Additionally, the redirect target is built as a relative path containing the padId, which can normalize unexpectedly for edge pad IDs like `..`.

Issue description

/p/:pad/timeslider bypasses the redirect for any truthy embed query value and uses a relative redirect target that can normalize oddly for edge pad IDs.

Issue Context

The embed path is intended only for the in-pad iframe (?embed=1).

Fix Focus Areas

src/node/hooks/express/specialpages.ts[230-237]
src/node/hooks/express/specialpages.ts[401-408]

Implementation notes

Change the condition to something strict like if (req.query.embed !== '1') redirect....
Prefer a simple redirect target of .. (from /p/:pad/timeslider this resolves to /p/:pad) or an absolute /p/${encodeURIComponent(req.params.pad)} to avoid composing ../${pad}.
Keep behavior consistent in both dev live-reload and production handlers.

PR 7709 (2026-05-09)

[correctness] Settings still textarea blob

Settings still textarea blob

The updated `/admin/settings` UI still renders the settings as a single `` value rather than a parsed, structured representation with inline comments alongside relevant keys. This fails the requirement to show settings.json in a structured, readable format with inline comments derived from JSON comments.

Issue description

/admin/settings still uses a plain <textarea> and does not render settings as parsed/structured JSON with inline comments.

Issue Context

Compliance requires a structured representation that preserves formatting and surfaces inline comments derived from JSON comments.

Fix Focus Areas

admin/src/pages/SettingsPage.tsx[66-76]

[maintainability] CSS rules conflict order

CSS rules conflict order

Both App.css and index.css define `.settings` styling, but `main.tsx` imports `App.tsx` (and thus App.css) before importing `index.css`, so the later `index.css` rules will override App.css for equal-specificity selectors. This can prevent the intended raw editor styling from applying consistently.

Issue description

App.css and index.css both style .settings, but import order makes index.css override App.css.

Issue Context

This can silently nullify the new styling without any runtime errors.

Fix Focus Areas

admin/src/main.tsx[1-5]
admin/src/App.tsx[1-4]
admin/src/App.css[1-16]
admin/src/index.css[293-306]

Suggested fix

Move the App.css import to main.tsx after index.css (and remove it from App.tsx), OR
Remove/rename the .settings rules from one file, OR
Increase selector specificity in App.css (less preferred) so it reliably overrides index.css.

[security] Protocol-relative wiki links

Protocol-relative wiki links

SettingsPage uses protocol-relative GitHub URLs (`//github.com/...`), which can downgrade to `http://` when the admin UI is served over HTTP and may break or redirect unexpectedly. External links should be explicit HTTPS.

Issue description

External links are protocol-relative and can resolve to HTTP.

Fix Focus Areas

admin/src/pages/SettingsPage.tsx[112-118]

Suggested fix

Change both hrefs to explicit https://github.com/....

[reliability] Parse error before load

Parse error before load

FormView treats `settings === undefined` as an empty string and renders ParseErrorBanner because `parseTree('')` yields no tree, even though settings simply haven’t arrived yet. App hides the loading overlay on socket connect (before the `settings` event), so users can briefly see a false parse error on page load.

Issue description

FormView renders a parse error when settings hasn’t loaded yet (undefined → ''), which is misleading. Because the app hides the loading overlay on socket connect (before receiving settings), the parse error can flash on initial load.

Issue Context

Empty string is not the same as “not yet loaded”. parseTree('') returns no tree and triggers the error banner.

Fix Focus Areas

admin/src/components/settings/FormView.tsx[34-46]
admin/src/App.tsx[48-77]

Suggested change

Choose one:

In FormView, read settings without ?? '' and if settings === undefined return a loading placeholder (or null) instead of ParseErrorBanner.
Alternatively (or additionally), in App.tsx keep showLoading true until the settings event is received (move/duplicate setShowLoading(false) from connect to the settings handler).

[reliability] Save toast ignores ack

Save toast ignores ack

SettingsPage.handleSave() shows a success toast immediately after emitting `saveSettings`, without waiting for any server-confirmed outcome. Because the backend emits `saveprogress: 'saved'` even if `writeFile()` throws, the UI can report a successful save even when nothing was written.

Issue description

The settings UI shows a "saved" toast immediately after emitting the saveSettings socket event. The server currently emits saveprogress: 'saved' even if the underlying file write fails, so the UI can incorrectly report success.

Issue Context

Frontend currently does not wait for any ack to decide success/failure.
Backend always emits saveprogress: 'saved' regardless of writeFile() outcome.

Fix Focus Areas

admin/src/pages/SettingsPage.tsx[53-64]
admin/src/App.tsx[80-82]
src/node/hooks/express/adminsettings.ts[42-50]

Proposed fix

Backend: on saveSettings, emit saveprogress: 'saved' only on success; emit a distinct failure status (and optionally an error message) on catch.
Frontend: move the success toast to be driven by the saveprogress event; show a failure toast on saveprogress: 'error' (or equivalent).

[reliability] Toast wait can false-pass

Toast wait can false-pass

The Playwright `saveSettings()` helper waits for `.ToastRootSuccess` to exist, but the Toast component assigns the success/failure class based on `toastState.success` even when the toast is closed. A previously-closed success toast can therefore satisfy the selector without a new save completing, making tests flaky and potentially masking failures.

Issue description

The Playwright helper waits for .ToastRootSuccess, which can exist even when the toast is closed, so the helper can proceed without confirming the new save finished.

Issue Context

ToastDialog applies .ToastRootSuccess based on toastState.success even when toastState.open is false.

Fix Focus Areas

src/tests/frontend-new/helper/adminhelper.ts[14-17]
admin/src/utils/Toast.tsx[5-23]

Proposed fix

Option A (tests only): Update the helper to wait for an open toast, e.g. await page.waitForSelector('.ToastRootSuccess[data-state="open"]') (or assert visibility via expect(...).toBeVisible()), and optionally wait for the previous toast to close first. Option B (app behavior): When closing the toast (onOpenChange), also reset success (and/or use a monotonically increasing toast id) so old classes don't persist when closed.

[correctness] IconButton missing type

IconButton missing type

IconButton renders a `` without a default `type`, which defaults to `submit` inside a ``. Reusing IconButton within a form can therefore trigger unintended submissions/navigation.

Issue description

IconButton does not set a safe default type, so it can submit forms unexpectedly.

Issue Context

IconButton is a shared component used across admin pages. Adding a default type="button" prevents accidental submits while still allowing callers to override to submit when desired.

Fix Focus Areas

admin/src/components/IconButton.tsx[9-13]

Proposed fix

Set a default type while still allowing override, for example:

Destructure type with default: ({type = 'button', ...rest}) and render <button type={type} ...>
Or render <button type={rest.type ?? 'button'} ...> and ensure spread order does not overwrite it unintentionally.

PR 7708 (2026-05-09)

[reliability] Staleness check race window

Staleness check race window

runCompactStale selects stale pads in a first pass and compacts them in a later pass without re-checking getLastEdited, so a pad edited between passes can still be compacted despite becoming “fresh”. This can disrupt active users because compaction can kick sessions from the pad.

Issue description

runCompactStale() determines staleness in a first loop, stores IDs, and later compacts from that stored list. If a pad is edited after selection but before compaction, it can still be compacted, which undermines the “skip hot pads” goal and may kick active sessions.

Issue Context

Compaction can kick sessions (Cleanup.deleteRevisions()), so compacting a pad that becomes active during the run is disruptive.

Fix Focus Areas

bin/compactStalePads.ts[110-180]

Suggested fix

Before compacting each pad (ideally right before calling compactPad, after pre-flight getRevisionsCount), call getLastEdited(padId) again and skip compaction if lastEdited > cutoff (increment skippedFresh and log). This keeps the staleness guarantee closer to real-time and minimizes the race window.

[reliability] Staleness check race window

Staleness check race window

runCompactStale selects stale pads in a first pass and compacts them in a later pass without re-checking getLastEdited, so a pad edited between passes can still be compacted despite becoming “fresh”. This can disrupt active users because compaction can kick sessions from the pad.

Issue description

runCompactStale() determines staleness in a first loop, stores IDs, and later compacts from that stored list. If a pad is edited after selection but before compaction, it can still be compacted, which undermines the “skip hot pads” goal and may kick active sessions.

Issue Context

Compaction can kick sessions (Cleanup.deleteRevisions()), so compacting a pad that becomes active during the run is disruptive.

Fix Focus Areas

bin/compactStalePads.ts[110-180]

Suggested fix

Before compacting each pad (ideally right before calling compactPad, after pre-flight getRevisionsCount), call getLastEdited(padId) again and skip compaction if lastEdited > cutoff (increment skippedFresh and log). This keeps the staleness guarantee closer to real-time and minimizes the race window.

PR 7705 (2026-05-08)

[reliability] `/admin/openapi.json` lacks feature flag

`/admin/openapi.json` lacks feature flag

The PR enables a new `/admin/openapi.json` endpoint by default via `src/ep.json` and mounts it unconditionally in `expressPreSession`, without any feature flag or default-off setting. This violates the requirement that new features be behind a feature flag and disabled by default.

Issue description

A new endpoint (/admin/openapi.json) is added and enabled by default, but the compliance checklist requires new features to be behind a feature flag and disabled by default.

Issue Context

openapi-admin is registered in src/ep.json and mounts /admin/openapi.json in expressPreSession with no conditional check.

Fix Focus Areas

src/ep.json[143-147]
src/node/hooks/express/openapi-admin.ts[158-165]

[correctness] Wrong baseUrl for admin

Wrong baseUrl for admin

`admin/src/api/client.ts` exports a single `fetchClient` whose `baseUrl` is `/api/`, but the merged OpenAPI schema now includes admin paths like `/admin-auth/` and `/admin/update/status` that are not under `/api/`, so using the generated client for those admin endpoints will hit the wrong URL.

Issue description

The generated paths type now includes both public and admin endpoints, but admin/src/api/client.ts exports a single OpenAPI client configured with API_BASE_URL ("/api/"). Admin endpoints live at root (e.g. /admin-auth/), so a single fixed base URL cannot correctly target both surfaces.

Issue Context

admin/scripts/dump-spec.ts merges the public and admin OpenAPI docs into one before running openapi-typescript. The runtime client should therefore either:

expose two clients/hooks (one for /api/<version> public calls, one for root-level admin calls), or
implement a routing layer that selects the appropriate base URL per path.

Fix Focus Areas

admin/src/api/client.ts[6-12]
admin/scripts/gen-api.mjs[56-72]
admin/scripts/dump-spec.ts[49-56]

PR 7704 (2026-05-08)

[security] Git tag option injection

Git tag option injection

`executeUpdate()` and `verifyReleaseTag()` pass the release tag (ultimately sourced from GitHub `tag_name`) as the first positional argument to `git checkout` / `git verify-tag` without validation or option-termination. A tag that starts with `-` can be parsed as a git option (bypassing signature verification and/or changing checkout behavior).

Issue description

Updater uses an externally sourced tag string as the first positional argument to git checkout and git verify-tag. If the tag begins with -, git can treat it as an option, potentially bypassing signature verification or altering checkout behavior.

Issue Context

The tag originates from GitHub’s releases/latest API (tag_name) and is persisted into updater state.

Fix Focus Areas

src/node/updater/UpdateExecutor.ts[154-165]
src/node/updater/trustedKeys.ts[37-66]
src/node/hooks/express/updateActions.ts[164-276]
src/node/updater/VersionChecker.ts[46-63]

What to change

Reject/normalize unsafe tags before use (at minimum: disallow leading -, whitespace, and other invalid refname characters).
Prefer passing a ref that cannot be parsed as an option, e.g. refs/tags/<tag>.
For git verify-tag, also add an explicit end-of-options marker: git verify-tag -- <tag-or-ref>.
For checkout, use a safe ref form (e.g. git checkout refs/tags/<tag>) and/or validate the tag strictly (e.g. semver tags only) before invoking git.

[reliability] `/admin/update/apply` exposed without flag

`/admin/update/apply` exposed without flag

The new Tier 2 action endpoints are registered whenever `updates.tier !== "off"`, so they exist even when Tier 2 is not enabled (e.g., tier `notify`). This violates the requirement that new features be gated behind a flag and disabled by default such that the disabled path matches prior behavior (these endpoints would previously 404).

Issue description

Tier 2 manual update endpoints (/admin/update/apply, /cancel, /acknowledge, /log) are registered for any tier except off, which exposes new behavior even when Tier 2 is not enabled.

Issue Context

Compliance requires new features to be behind a feature flag and disabled by default such that the disabled path matches prior behavior (ideally 404 / route not mounted).

Fix Focus Areas

src/node/hooks/express/updateActions.ts[111-120]

[reliability] Cancel races preflight

Cancel races preflight

/admin/update/cancel can set execution back to idle and remove update.lock while /admin/update/apply is still running preflight, but the apply handler does not re-check for cancellation before proceeding to drain/execute. This allows an update to proceed after cancellation and can permit a second apply to start concurrently, risking inconsistent state and repository corruption.

Issue description

POST /admin/update/cancel can run while /admin/update/apply is still in preflight, but cancel immediately writes execution: idle and deletes the lock file. The original apply request continues and may still start draining/executing, and the early lock release can allow concurrent apply attempts.

Issue Context

Apply holds the lock across preflight + drain + executor.
Cancel is allowed in preflight and draining.
Apply only reloads state right before executing, not immediately after preflight.

Fix Focus Areas

src/node/hooks/express/updateActions.ts[145-214]
src/node/hooks/express/updateActions.ts[266-289]
src/node/updater/lock.ts[69-72]

Suggested fix

Do not remove the lock in /cancel while an apply request might still be running. Instead:
Record a cancellation request in state (e.g., execution: {status: 'idle'} plus a cancelRequestedAt, or introduce a dedicated cancel-requested execution status).
Let the apply handler be the only place that releases the lock when it observes cancellation.
After runPreflight() completes (before creating the drainer), reload state and abort if execution is no longer preflight for the same targetTag.
Optionally, make releaseLock() validate ownership (PID match) before unlinking, or provide a releaseIfOwned(lockPath, pid) helper.

[reliability] Drain can wedge joins

Drain can wedge joins

SessionDrainer turns off acceptingConnections and intentionally does not restore it after the drain completes, assuming the process will exit; but if the update path throws after drain completion and the process does not exit, the server will keep disconnecting all new CLIENT_READY handshakes. This can create an outage that persists until manual restart.

Issue description

acceptingConnections is a module-level gate used to refuse new CLIENT_READY connections during drain. The drainer intentionally leaves it false after drain completion, assuming the process will immediately exit. If anything throws after drain completion and the process does not exit, new joins remain blocked indefinitely.

Issue Context

Drain completion does not restore acceptingConnections.
/admin/update/apply can catch exceptions after the 202 response is sent and continue running.

Fix Focus Areas

src/node/updater/SessionDrainer.ts[44-66]
src/node/handler/PadMessageHandler.ts[380-388]
src/node/hooks/express/updateActions.ts[204-263]

Suggested fix

Add an explicit API in SessionDrainer to restore the flag (e.g., setAcceptingConnections(true) or restoreAcceptingConnections()), and call it in the apply handler’s catch/finally path when the process is not going to exit.
Alternatively, set acceptingConnections=true upon drain completion, and introduce a separate guard tied to persisted update execution.status to refuse joins during executing/rolling-back if needed (so failures can reliably re-enable joins).

[correctness] socket.json.send API misuse

socket.json.send API misuse

PadMessageHandler uses `socket.json.send(...)` when refusing joins during drain; this is inconsistent with other disconnect paths in the same file that use `socket.emit('message', ...)`. With socket.io v4 this API is at best nonstandard and may throw at runtime, breaking the join-refusal logic during updates.

Issue description

The drain join-refusal path uses socket.json.send({disconnect: ...}), which is inconsistent with other disconnect paths and risks not working under socket.io v4.

Issue Context

Other disconnect paths in PadMessageHandler use socket.emit('message', {disconnect: ...}).

Fix Focus Areas

src/node/handler/PadMessageHandler.ts[380-387]

Suggested fix

Replace:

socket.json.send({disconnect: 'updateInProgress'}); with:
socket.emit('message', {disconnect: 'updateInProgress'}); (or socket.send({disconnect: ...}) if the server expects default "message" event), then keep socket.disconnect(true).

[reliability] Unhandled spawn error events

Unhandled spawn error events

UpdateExecutor.runStep() and RollbackHandler.runStep() do not listen for the child's 'error' event, only 'close'. If `spawn()` fails (e.g., executable missing/permission errors), the update flow can hang or crash, leaving state in-flight and potentially keeping session drain active.

Issue description

runStep() helpers in UpdateExecutor/RollbackHandler (and verifyReleaseTag) do not handle child.on('error', ...). Spawn failures can hang promises or crash the process.

Issue Context

These helpers are used in critical paths that mutate state and may have already disabled new connections.

Fix Focus Areas

src/node/updater/UpdateExecutor.ts[44-67]
src/node/updater/RollbackHandler.ts[23-43]
src/node/updater/trustedKeys.ts[41-52]

Suggested fix

Add an 'error' listener to each spawned child.
For UpdateExecutor.runStep, resolve {code: 1, stderr: err.message} (or reject and let the caller catch and transition state).
For RollbackHandler.runStep, resolve a non-zero code on error.
For verifyReleaseTag, treat spawn error as verification failure with proper logging.
Ensure any error path results in a persisted terminal/in-flight state that RollbackHandler can handle.

[reliability] Executor can leave busy state

Executor can leave busy state

executeUpdate() writes `execution.status='executing'` and then performs operations (e.g., lockfile backup) that can throw, but it does not catch exceptions to transition state to rolling-back/failed. The apply route catch block logs and (if possible) responds, but it does not repair execution state, so the system can remain stuck in a busy status that blocks further updates.

Issue description

If readSha(), saveState(), or copyFile() throws in executeUpdate(), the persisted state can remain executing and the system becomes stuck (future apply attempts get 409 busy).

Issue Context

executeUpdate() currently only transitions to rolling-back for non-zero exit codes from the spawned commands. Exceptions bypass that.

Fix Focus Areas

src/node/updater/UpdateExecutor.ts[78-127]
src/node/hooks/express/updateActions.ts[119-124]

Suggested fix

Wrap the body of executeUpdate() in a top-level try/catch.
On any exception:
Persist execution: {status: 'rolling-back', reason: <error message>, ...} (or a dedicated failed-* status) so the route layer can call performRollback().
Append an explicit log line.
Optionally harden the apply handler to detect a thrown executor error and set a safe terminal state (instead of leaving it busy).

[reliability] Execution state under-validated

Execution state under-validated

`loadState()` considers `execution` valid if it only contains a recognized `status`, without validating required fields (e.g., `fromSha`, `targetTag`) for that status. A malformed/hand-edited state file can therefore reach RollbackHandler with missing fields and trigger invalid rollback/verification behavior.

Issue description

The on-disk update state validator accepts any execution object that merely has a known status. This can let malformed state reach rollback/verification code paths that assume required fields exist.

Issue Context

loadState() is intended to “safely reset on malformed input”, and Tier 2 introduces a discriminated union (ExecutionStatus) with required fields per status.

Fix Focus Areas

src/node/updater/state.ts[11-103]
src/node/updater/types.ts[56-86]
src/node/updater/RollbackHandler.ts[67-78]

What to change

Strengthen isValidExecution() to validate required fields based on status (type-guard each union member).
Example: if status==='pending-verification' require targetTag, fromSha, deadlineAt are strings.
Alternatively (or additionally), in loadState() coerce invalid/missing execution subfields back to {status:'idle'} to preserve boot safety.
Consider also validating lastResult.outcome against the allowed set, not just typeof === 'string'.

[security] Public status exposes diagnostics

Public status exposes diagnostics

When `updates.requireAdminForStatus` is false (default), `/admin/update/status` is unauthenticated but now returns `execution` and `lastResult`, including SHAs and human-readable failure reasons derived from git/pnpm stderr. This leaks operational diagnostics and update outcomes to unauthenticated callers.

Issue description

/admin/update/status is intended to be open by default, but it now includes execution and lastResult fields that can contain internal SHAs and command stderr-derived reasons.

Issue Context

The endpoint has an opt-in updates.requireAdminForStatus gate, but the new fields are returned regardless.

Fix Focus Areas

src/node/hooks/express/updateStatus.ts[72-103]
src/node/updater/UpdateExecutor.ts[102-120]

Suggested fix

If updates.requireAdminForStatus is false, omit or redact sensitive subfields:
Drop execution and lastResult entirely, or
Return only execution.status (no reason, fromSha) and a boolean lastResultSuccess.
Alternatively, add a separate admin-only endpoint for detailed execution/log info and keep /admin/update/status minimal.

[correctness] Wrong short-drain countdown

Wrong short-drain countdown

For drain windows shorter than 30s/10s, SessionDrainer schedules the T-30/T-10 broadcasts at time 0 but still reports `seconds: 30` and `seconds: 10`, which is incorrect. This can mislead users during shorter configured drain periods.

Issue description

T-30 and T-10 announcements currently fire immediately for short drain windows but advertise 30/10 seconds remaining.

Fix Focus Areas

src/node/updater/SessionDrainer.ts[50-56]

Suggested fix

Only schedule T-30 if drainSeconds >= 30 and T-10 if drainSeconds >= 10.
Or, if you want “fire ASAP” behavior, pass secondsRemaining = Math.min(threshold, drainSeconds) so the displayed value is accurate.

PR 7703 (2026-05-07)

[correctness] Dropdowns clipped in popups

Dropdowns clipped in popups

Setting `.popup-content` to `overflow-y:auto` makes it a scroll container, which clips absolutely positioned descendants such as Nice Select’s option list. As a result, font/language dropdown menus in Settings can be truncated when opened near the bottom of the popup on short viewports, making options hard/impossible to click.

Issue description

With overflow-y: auto on .popup-content, popups become scroll containers that clip absolutely-positioned dropdown lists (Nice Select). This can truncate the font/language dropdown list in Settings and other popups on short viewports.

Issue Context

Nice Select renders the options list as position: absolute; top: 100% under the .nice-select element. When .popup-content is a scroll container, overflow outside the popup is clipped.

Fix Focus Areas

Make Nice Select lists escape the scroll container when used inside popups (e.g., switch to position: fixed and set coordinates like the toolbar path does, or move/portal the list to body).
Alternatively, toggle a CSS class on the nearest .popup-content while a Nice Select is open to temporarily set overflow to visible, then restore it on close.
src/static/css/pad/popup.css[25-44]
src/static/css/pad/form.css[106-139]
src/static/js/vendors/nice-select.ts[96-129]

[reliability] Brittle scrollHeight test assertion

Brittle scrollHeight test assertion

The new Playwright test hard-requires `scrollHeight > clientHeight`, so it will fail if Settings content ever fits within 900×500 (due to CSS tweaks, localization, or feature changes) even though the popup remains correctly scrollable. The test already checks reachability of the bottom button via `scrollIntoViewIfNeeded()` + `toBeInViewport()`, which is the behavior that matters.

Issue description

The test requires scrollHeight > clientHeight, which is fragile because it depends on Settings content height always overflowing at 900×500.

Issue Context

The key behavior to validate is: the popup uses scrollable overflow styling and the bottom-most action (Delete pad) is reachable (not permanently cropped). The test already scrolls to #delete-pad and asserts it is in the viewport.

Fix Focus Areas

Remove/relax the scrollHeight > clientHeight assertion and rely on reachability checks.
Or, if overflow must be asserted, first assert #delete-pad is NOT in viewport before scrolling (to prove scrolling was needed) and adjust viewport height accordingly.
src/tests/frontend-new/specs/pad_settings.spec.ts[170-191]

PR 7698 (2026-05-07)

[security] `ep_*` passthrough lacks flag

`ep_*` passthrough lacks flag

The new plugin padOptions passthrough is enabled unconditionally on both client and server, changing behavior by default. This violates the requirement that new features be guarded by a feature flag and disabled by default.

Issue description

A new behavior (passing through plugin ep_* padOptions keys) is enabled unconditionally, but compliance requires new features to be behind a feature flag and disabled by default.

Issue Context

Both server-side Pad.normalizePadSettings() and client-side pad.applyPadSettings() currently preserve ep_* keys without checking any feature flag, so the behavior change is always active.

Fix Focus Areas

src/node/db/Pad.ts[95-122]
src/static/js/pad.ts[862-886]

[maintainability] New plugin API undocumented

New plugin API undocumented

This PR introduces a new plugin-facing capability module and behavior change, but does not update repository documentation accordingly. This can lead to plugin authors missing the new supported interface and its constraints.

Issue description

A new plugin-facing API/behavior was added (padOptionsPluginPassthrough capability and ep_* padOptions passthrough), but docs were not updated.

Issue Context

Plugins need a documented way to discover and correctly use the new behavior (including the exact key pattern /^ep_[a-z0-9_]+$/ and any limitations).

Fix Focus Areas

src/node/utils/PluginCapabilities.ts[1-11]
src/node/db/Pad.ts[43-53]
src/static/js/pad.ts[877-882]
doc/plugins.md[1-200]

[reliability] Unbounded plugin option payload

Unbounded plugin option payload

Pad.normalizePadSettings copies every /^ep_[a-z0-9_]+$/ key/value verbatim into padSettings, and padSettings is persisted and sent to every client as initialOptions. This introduces an unvalidated path for arbitrary plugin-provided objects (size/type), which can lead to oversized broadcasts and persistence/serialization failures if a plugin stores unsupported values.

Issue description

normalizePadSettings() now persists and broadcasts ep_* pad option values verbatim. Without validation, server-side plugins can accidentally store very large or non-serializable values that will be persisted and sent to all clients.

Issue Context

Pad.saveToDatabase() persists the whole Pad object (including padSettings).
clientVars.initialOptions and padoptions messages distribute pad settings to all connected clients.

Fix Focus Areas

src/node/db/Pad.ts[95-122]
src/node/handler/PadMessageHandler.ts[319-349]
src/node/handler/PadMessageHandler.ts[1165-1185]

Suggested fix approach

In normalizePadSettings(), for ep_* keys:
Enforce that values are JSON-safe (or at least plain JSON-compatible types: null/boolean/number/string/array/plain object).
Enforce a maximum serialized size per key and/or total size across all ep_* keys.
If a value is rejected, drop it (and optionally log at warn-level with the key name and padId when available).

[maintainability] PadOption type misses ep_*

PadOption type misses ep_*

Runtime behavior now accepts `ep_*` keys in pad options on the client, but the shared `PadOption` TypeScript type does not allow any `ep_*` properties. This makes the socket message types inaccurate and can cause TS code to reject/strip plugin padOptions or require unsafe casts.

Issue description

PadOption (SocketIO message type) does not model the newly supported ep_* keys, making TypeScript types diverge from runtime behavior.

Issue Context

Plugins and core TS code that construct/handle pad options messages may rely on PadOption typing.

Fix Focus Areas

src/static/js/types/SocketIOMessage.ts[251-264]

Suggested fix approach

Add an index signature for plugin keys, ideally constrained with a template-literal key type, e.g.:
[k: ep_${string}]: unknown;
Alternatively, add a broader index signature if template literal types aren’t desired in this file.
Consider also typing view more narrowly if needed (optional).

PR 7697 (2026-05-07)

[reliability] English-only aria-label assertion

English-only aria-label assertion

`a11y_dialogs.spec.ts` asserts `aria-label` starts with the English string "Export ", which will fail once these new localization keys are translated (or if test locale changes to a translated language). This makes the Playwright suite brittle even though the underlying UI behavior is correct.

Issue description

The export-links a11y test asserts an English-only aria-label prefix ("Export "). Once non-English translations for pad.importExport.export<format>a.title are added, the aria-label will be localized and the test will fail.

Issue Context

html10n.localize([cookieLang, navigator.language, 'en']) can select a non-English language, and Playwright configuration does not pin the browser locale.

Fix Focus Areas

src/tests/frontend-new/specs/a11y_dialogs.spec.ts[68-95]
src/playwright.config.ts[45-69]
src/static/js/l10n.ts[4-13]

Suggested fix

Pick one:

Locale-agnostic assertions: replace the startsWith('Export ') check with expect(ariaLabel.length).toBeGreaterThan(0) (and keep title === ariaLabel + aria-hidden=true), matching the existing pattern used for #chaticon.
Pin test locale: set Playwright context locale: 'en-US' globally or for this test file (test.use({locale: 'en-US'})) so the English prefix assertion remains stable.

[security] Missing noopener on blank links

Missing noopener on blank links

The export links in `pad.html` open in a new tab via `target="_blank"` but still omit `rel="noopener"`, leaving `window.opener` exposed to the opened page. Because this PR touched these anchors, it’s a good opportunity to fix the reverse-tabnabbing hardening gap.

Issue description

Export links use target="_blank" without rel="noopener", allowing the opened page to access window.opener.

Issue Context

These anchors are modified in this PR, so adding rel is low-risk and consistent with other target="_blank" usage in the same template.

Fix Focus Areas

src/templates/pad.html[326-343]

Suggested fix

Add rel="noopener" (or rel="noopener noreferrer") to each export <a ... target="_blank" ...> in the export column.

PR 7695 (2026-05-07)

[reliability] Admin build depends on backend

Admin build depends on backend

admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.

Issue description

admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.

Issue Context

The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.

Fix Focus Areas

admin/package.json[6-13]

Suggested fix

Remove pnpm gen:api && from build and build-copy scripts (keep gen:api as an explicit script).
Rely on the existing CI freshness step to enforce that generated outputs are committed.
(Optional) Keep pnpm gen:api in a dedicated script like build:regen if you still want a one-shot regen+build command.

[reliability] Admin build depends on backend

Admin build depends on backend

admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.

Issue description

admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.

Issue Context

The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.

Fix Focus Areas

admin/package.json[6-13]

Suggested fix

Remove pnpm gen:api && from build and build-copy scripts (keep gen:api as an explicit script).
Rely on the existing CI freshness step to enforce that generated outputs are committed.
(Optional) Keep pnpm gen:api in a dedicated script like build:regen if you still want a one-shot regen+build command.

[maintainability] Generated schema committed

Generated schema committed

This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.

Issue description

Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.

Issue Context

The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.

Fix Focus Areas

admin/src/api/schema.d.ts[1-7]
admin/src/api/version.ts[1-5]
admin/package.json[7-20]
.github/workflows/frontend-admin-tests.yml[71-80]
admin/README.md[34-40]

[maintainability] `schema.d.ts` not 2-space

`schema.d.ts` not 2-space

The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.

Issue description

The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.

Issue Context

This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.

Fix Focus Areas

admin/scripts/gen-api.mjs[40-62]
admin/src/api/schema.d.ts[9-27]

[reliability] Admin build depends on backend

Admin build depends on backend

admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.

Issue description

admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.

Issue Context

The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.

Fix Focus Areas

admin/package.json[6-13]

Suggested fix

Remove pnpm gen:api && from build and build-copy scripts (keep gen:api as an explicit script).
Rely on the existing CI freshness step to enforce that generated outputs are committed.
(Optional) Keep pnpm gen:api in a dedicated script like build:regen if you still want a one-shot regen+build command.

[maintainability] Generated API files committed

Generated API files committed

This PR adds/updates generated artifacts (`admin/src/api/schema.d.ts` and `admin/src/api/version.ts`) and documents that they are checked in. Committing generated outputs violates the requirement to avoid committing build/runtime-generated artifacts.

Issue description

Generated OpenAPI outputs (admin/src/api/schema.d.ts and admin/src/api/version.ts) are committed and CI enforces that they remain committed and up to date.

Issue Context

The compliance checklist requires that generated build/runtime artifacts not be committed.

Fix Focus Areas

admin/src/api/schema.d.ts[1-7]
admin/src/api/version.ts[1-5]
admin/README.md[34-53]
.github/workflows/frontend-admin-tests.yml[71-80]
admin/package.json[7-13]

[maintainability] Generated schema committed

Generated schema committed

This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.

Issue description

Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.

Issue Context

The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.

Fix Focus Areas

admin/src/api/schema.d.ts[1-7]
admin/src/api/version.ts[1-5]
admin/package.json[7-20]
.github/workflows/frontend-admin-tests.yml[71-80]
admin/README.md[34-40]

[maintainability] Generated schema committed

Generated schema committed

This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.

Issue description

Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.

Issue Context

The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.

Fix Focus Areas

admin/src/api/schema.d.ts[1-7]
admin/src/api/version.ts[1-5]
admin/package.json[7-20]
.github/workflows/frontend-admin-tests.yml[71-80]
admin/README.md[34-40]

[maintainability] Generated schema committed

Generated schema committed

This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.

Issue description

Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.

Issue Context

The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.

Fix Focus Areas

admin/src/api/schema.d.ts[1-7]
admin/src/api/version.ts[1-5]
admin/package.json[7-20]
.github/workflows/frontend-admin-tests.yml[71-80]
admin/README.md[34-40]

[maintainability] `schema.d.ts` not 2-space

`schema.d.ts` not 2-space

The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.

Issue description

The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.

Issue Context

This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.

Fix Focus Areas

admin/scripts/gen-api.mjs[40-62]
admin/src/api/schema.d.ts[9-27]

[reliability] Admin build depends on backend

Admin build depends on backend

admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.

Issue description

admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.

Issue Context

The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.

Fix Focus Areas

admin/package.json[6-13]

Suggested fix

Remove pnpm gen:api && from build and build-copy scripts (keep gen:api as an explicit script).
Rely on the existing CI freshness step to enforce that generated outputs are committed.
(Optional) Keep pnpm gen:api in a dedicated script like build:regen if you still want a one-shot regen+build command.

[maintainability] `schema.d.ts` not 2-space

`schema.d.ts` not 2-space

The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.

Issue description

The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.

Issue Context

This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.

Fix Focus Areas

admin/scripts/gen-api.mjs[40-62]
admin/src/api/schema.d.ts[9-27]

[maintainability] `schema.d.ts` not 2-space

`schema.d.ts` not 2-space

The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.

Issue description

The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.

Issue Context

This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.

Fix Focus Areas

admin/scripts/gen-api.mjs[40-62]
admin/src/api/schema.d.ts[9-27]

PR 7692 (2026-05-07)

[maintainability] Settings type mismatch

Settings type mismatch

`Settings.coerceValue()` can coerce env-var expansions (including `socialMeta.description`) into `number | boolean`, but both `SettingsType` and `SocialMetaSettings` still declare `description` as `string | null`, forcing unsafe casts and hiding the real contract from future callers. This increases the chance of future runtime errors if code assumes a string and calls string methods without guards.

Issue description

Settings.coerceValue() can produce number | boolean values for env-var-expanded settings, but the TypeScript types still claim socialMeta.description is string | null. This mismatch forces unsafe casts and makes it easy for future code to accidentally assume a string.

Issue Context

The PR mitigates the runtime issue by accepting unknown in resolveDescriptionWithOverride() and stringifying string|number|boolean. The type system should reflect this reality so callers and tests don’t need as unknown as string.

Fix Focus Areas

src/node/utils/Settings.ts[160-170]
src/node/utils/Settings.ts[862-897]
src/node/utils/socialMeta.ts[98-107]
src/node/utils/socialMeta.ts[164-182]
src/tests/backend/specs/socialMeta-unit.ts[253-285]

Suggested fix

Update the settings type(s) to reflect runtime: description: string | number | boolean | null (optionally also undefined if applicable).
Update SocialMetaSettings in socialMeta.ts similarly.
Replace override: unknown with override: string | number | boolean | null | undefined.
Remove the as unknown as string casts from the new unit tests (they should compile with the widened types).

PR 7690 (2026-05-07)

[maintainability] 4-space indentation in test

4-space indentation in test

New test file uses 4+ space continuation indentation (not 2 spaces), which violates the repository coding style requirement. This can cause formatting inconsistency and potential lint failures.

Issue description

The new Playwright spec uses 4+ space continuation indentation, but the codebase requires 2-space indentation.

Issue Context

The themeColor helper is split across lines with 4 spaces, and the multiline test(...) call uses 6 spaces before async.

Fix Focus Areas

src/tests/frontend-new/specs/theme_color_dark_mode.spec.ts[4-5]
src/tests/frontend-new/specs/theme_color_dark_mode.spec.ts[33-34]

[maintainability] Duplicated color mapping

Duplicated color mapping

The toolbar variant → hex color table is duplicated in both client code (skin_variants.ts) and server code (SkinColors.ts), so future updates can easily change one side and silently desync the server-rendered baseline theme-color from client-side updates. This can regress the exact “wrong address bar color” behavior this PR is fixing.

Issue description

The toolbar variant → theme color mapping is duplicated in both server and client code. This duplication can drift over time (CSS changes / palette changes), causing the server-rendered theme-color meta to disagree with the client’s dynamic updates.

Issue Context

Server baseline meta is computed in configuredToolbarColor().
Client updates are computed in updateThemeColorMeta().
Both hard-code the same table.

Fix Focus Areas

src/static/js/skin_variants.ts[7-33]
src/node/utils/SkinColors.ts[3-34]

Suggested approach

Create a small shared module (usable by both Node and browser builds) that exports the ordered mapping and default color, and import it from both places. If cross-bundle imports aren’t feasible, add a single-source generation step or a dedicated test that asserts the two tables are identical to prevent accidental drift.

PR 7689 (2026-05-07)

[maintainability] COREPACK_HOME path duplication

COREPACK_HOME path duplication

The Dockerfile sets `COREPACK_HOME` but later hard-codes `/opt/corepack` for `mkdir`/`chown`, so a future change to `COREPACK_HOME` can silently break cache sharing and permissions. This can reintroduce the original “pnpm falls back to latest” failure mode or cause permission errors.

Issue description

COREPACK_HOME is set, but /opt/corepack is repeated literally in the same stage. If COREPACK_HOME is changed later, the directory creation/ownership logic will not follow, risking a permission mismatch and reintroducing the pnpm/corepack cache issue.

Issue Context

The PR’s intent is to pin the corepack cache directory and ensure it is readable/writable after switching to USER etherpad.

Fix Focus Areas

Dockerfile[108-116]

Suggested change

Replace hard-coded /opt/corepack usages with ${COREPACK_HOME} (and quote it), e.g.:

mkdir -p /usr/share/man/man1 "${COREPACK_HOME}"
chown -R etherpad:etherpad "${COREPACK_HOME}" This keeps the filesystem operations aligned with the configured cache path.

PR 7688 (2026-05-07)

[reliability] padOptions shim can crash

padOptions shim can crash

Settings.reloadSettings() unconditionally indexes and assigns settings.padOptions[key]; if settings.padOptions is null or a primitive (possible via settings.json), reloadSettings() will throw a TypeError during startup/reload.

Issue description

reloadSettings() assumes settings.padOptions is a non-null object when normalizing legacy userName/userColor values. If settings.json sets padOptions to null or another non-object, the shim can throw (TypeError) during startup or settings reload.

Issue Context

storeSettings() can overwrite settings.padOptions with any non-object value from settings.json, so a bad config can reach this new code path.

Fix Focus Areas

src/node/utils/Settings.ts[1098-1105]
src/node/utils/Settings.ts[811-833]

Suggested change

Add a type/shape guard before indexing/assigning:

Only run the shim if settings.padOptions != null && typeof settings.padOptions === 'object' && !Array.isArray(settings.padOptions).
Otherwise, skip the shim (and optionally log a clear warning/error about invalid padOptions type).

PR 7678 (2026-05-05)

[correctness] Duplicate leave removes active user

Duplicate leave removes active user

With the new logic, authenticated duplicates can stay connected with the same authorId, but `handleDisconnect` still broadcasts `USER_LEAVE` for every socket that disconnects. Because the client tracks presence keyed by `userId` (authorId) and deletes that entry on `USER_LEAVE`, disconnecting one of multiple same-author sockets makes the author appear offline even though another socket is still connected.

Issue description

Authenticated duplicate sessions are now allowed (same authorId, same pad), but the server still emits USER_LEAVE on any socket disconnect. This makes clients drop the author from presence even if another socket with the same authorId is still connected.

Issue Context

Client presence is keyed by userId (authorId). If multiple sockets share an authorId, the server must only emit USER_LEAVE when the last socket for that author leaves the pad.

Fix Focus Areas

src/node/handler/PadMessageHandler.ts[227-257]
src/node/handler/PadMessageHandler.ts[1023-1044]

Implementation notes

In handleDisconnect, before broadcasting USER_LEAVE, check _getRoomSockets(session.padId) for any remaining socket whose sessioninfos[roomSocket.id]?.author === session.author.
Only broadcast USER_LEAVE (and run userLeave hook if appropriate) when no remaining sockets exist for that author in that pad.

PR 7674 (2026-05-05)

[reliability] Corepack pnpm prepare failure

Corepack pnpm prepare failure

The Dockerfile now runs `corepack prepare pnpm@${PnpmVersion} --activate` without first updating corepack; this repo already documents that Node 22’s bundled corepack can reject newer pnpm versions due to a stale signing-key list, which would fail the Docker build at that step. This occurs in both the `adminbuild` and `build` stages, so a rejection would block image builds entirely.

Issue description

The Dockerfile provisions pnpm via corepack prepare pnpm@${PnpmVersion} --activate but does not refresh corepack first. The repo’s snap build already documents that Node 22’s bundled corepack can reject newer pnpm versions due to stale signing keys; if that happens here, the image build fails at corepack prepare.

Issue Context

This is done in two stages (adminbuild and build).
The snap packaging flow addresses the exact failure mode by upgrading corepack before running corepack prepare.

Fix Focus Areas

Dockerfile[12-20]
Dockerfile[98-109]
snap/snapcraft.yaml[111-118]

Implementation direction

In both Dockerfile stages, install/upgrade corepack (using the bundled npm before deleting npm), then run corepack enable + corepack prepare pnpm@${PnpmVersion} --activate, then remove npm/npx as you do today.
Keep/extend the comment to explain why corepack is upgraded first (mirror the rationale in snap/snapcraft.yaml).

[reliability] Forced uuid major upgrade

Forced uuid major upgrade

The new pnpm override forces `uuid` from `=14` across the workspace, changing the resolved `uuid` version used by `@azure/msal-node`. Because this bypasses downstream semver constraints, it can break any runtime path that loads `@azure/msal-node` if it is incompatible with `uuid@14`.

Issue description

The pnpm override forces a major uuid upgrade broadly (>=14.0.0), which can unintentionally pull newer uuid versions later and may violate transitive dependencies’ expectations.

Issue Context

Lockfile currently resolves uuid@14.0.0 for @azure/msal-node, but the override allows future automatic upgrades beyond 14.0.0 without review.

Fix Focus Areas

package.json[54-80]
pnpm-lock.yaml[18-26]

Suggested fix

Change the override value from >=14.0.0 to an exact pin (e.g. 14.0.0) or a tightly-scoped range you’re comfortable supporting.
Add/extend CI coverage to exercise/import the relevant Azure/MSAL path if it is a supported runtime feature.

PR 7667 (2026-05-04)

[correctness] lastSeen lost or stale

lastSeen lost or stale

`lastSeen` is returned by `searchAuthors()` but it is not updated when an existing author is “seen” via token/mapper resolution (only `timestamp` is updated), and `anonymizeAuthor()` overwrites `globalAuthor` without preserving `lastSeen`, making the Admin UI’s “Last seen” column stale or blank (especially after erasure).

Issue description

The new lastSeen field is used by the admin author listing (searchAuthors()), but it is (a) not updated when an author is resolved via token/mapper mapping and (b) dropped when anonymizeAuthor() overwrites the globalAuthor record. This causes the /admin/authors “Last seen” column and sorting to be misleading.

Issue Context

mapAuthorWithDBKey() is a high-frequency “author seen” path and currently only updates timestamp.
anonymizeAuthor() overwrites the author object twice and currently omits lastSeen.
searchAuthors() only reads rec.lastSeen and outputs null if missing.

Fix Focus Areas

src/node/db/AuthorManager.ts[117-138]
src/node/db/AuthorManager.ts[377-384]
src/node/db/AuthorManager.ts[412-421]
src/node/db/AuthorManager.ts[511-518]

Suggested fix

When updating timestamp for an existing author mapping, also update lastSeen (likely to the same Date.now() value).
When overwriting globalAuthor in anonymizeAuthor(), include lastSeen (either preserve existing.lastSeen or set it to Date.now() at erasure time).
(Optional hardening) In searchAuthors(), consider falling back to rec.timestamp if rec.lastSeen is missing to avoid blank values for older records.

[security] `/authors` route not flagged

`/authors` route not flagged

The PR introduces a new admin authors UI and related read-only socket endpoints that are available regardless of any feature flag state. This conflicts with the requirement that new features be gated behind a feature flag and disabled by default.

Issue description

New admin functionality (the /admin/authors page and its read-only socket endpoints) is available even when gdprAuthorErasure.enabled is false, which violates the requirement that new features be behind a feature flag and disabled by default.

Issue Context

Currently, only the destructive anonymizeAuthor socket handler is gated; the new listing/preview capabilities are not.

Fix Focus Areas

admin/src/main.tsx[26-26]
admin/src/App.tsx[109-109]
src/node/hooks/express/adminsettings.ts[310-347]

[reliability] Socket destructure can throw

Socket destructure can throw

The `/settings` socket handlers destructure `{authorID}` in the parameter list for `anonymizeAuthorPreview` and `anonymizeAuthor`; if the client emits the event with `undefined`/`null` payload, it throws before the try/catch and can result in an unhandled rejection and a broken admin-socket flow.

Issue description

anonymizeAuthorPreview / anonymizeAuthor socket handlers destructure authorID in the argument list. If a client emits the event without a payload (or with null), destructuring throws before entering the handler body, bypassing the handler’s try/catch.

Issue Context

Even if the shipped UI always sends {authorID: ...}, defensive server code should treat socket payloads as untrusted and avoid pre-body destructuring.

Fix Focus Areas

src/node/hooks/express/adminsettings.ts[330-368]

Suggested fix

Change handler signatures to accept payload: any (optionally defaulting to {}), then read const authorID = payload?.authorID; inside the try block.
Keep existing validation that emits {error: 'authorID is required'} when missing. Example pattern:

[correctness] Preview error not handled

Preview error not handled

The authors UI treats every `results:anonymizeAuthorPreview` as a successful preview; if the server responds with `{error}` (as the backend does on exceptions), the modal still renders counter placeholders from missing fields and Continue remains enabled.

Issue description

AnonymizePreview includes an optional error field, and the backend emits {authorID, error: ...} on failures. The UI currently moves to the preview phase regardless and renders counters that may be undefined, and it still allows clicking Continue.

Issue Context

This is a robustness issue for exceptional cases (DB read errors, etc.) that results in broken modal text and potentially misleading flows.

Fix Focus Areas

admin/src/pages/AuthorPage.tsx[58-75]
admin/src/pages/AuthorPage.tsx[151-179]
admin/src/utils/AuthorSearch.ts[28-36]

Suggested fix

In onPreview, branch on data.error:
either keep the dialog in a dedicated error phase that renders the message and disables Continue, or
close the dialog and show a toast/error banner.
Ensure the modal’s counter rendering is only executed when all expected numeric fields are present.

PR 7665 (2026-05-03)

[reliability] uncaughtException doesn’t exit

uncaughtException doesn’t exit

tests/backend/diagnostics.ts registers an uncaughtException handler that only logs and returns, which can prevent backend-tests from exiting non-zero on fatal errors when tests/backend/common.ts is not imported. This can turn a crash into a hang/continued execution, and it also breaks the intended “convert unhandledRejection to uncaught exception” fail-fast behavior for specs that don’t load common.ts.

Issue description

src/tests/backend/diagnostics.ts installs an uncaughtException handler that only logs. If tests/backend/common.ts is not imported in a spec run, this handler can prevent a fatal error from forcing a non-zero exit (common.ts explicitly calls process.exit(1) to preserve default behavior).

Issue Context

src/tests/backend/common.ts has an uncaughtException handler that logs and then process.exit(1) specifically to preserve default behavior when a handler exists.
src/package.json now requires ./tests/backend/diagnostics.ts globally, but does not require common.ts.
Some specs don’t import common.ts, so diagnostics.ts can be the only handler.

Fix Focus Areas

src/tests/backend/diagnostics.ts[55-61]

Suggested change

After logging, ensure the process still fails fast when no other handler will do it. Options:

Mimic common.ts: call process.exit(1) after logging.
If you want to defer to other handlers when present: only force-exit if this is the only uncaughtException listener (e.g., if (process.listeners('uncaughtException').length === 1) process.exit(1);), otherwise return.
Alternatively, set process.exitCode = 1 and schedule a setImmediate(() => process.exit(1)) so later-registered handlers still get a chance to run/log first.

PR 7660 (2026-05-02)

[correctness] Click steals rename focus

Click steals rename focus

Clicking the editable name for unnamed users will also trigger the new row click handler, which then focuses #chatinput and interrupts the rename workflow. This makes it difficult or impossible to name unnamed users from the user list.

Issue description

The new delegated click handler on #otheruserstable tr[data-authorId] triggers even when the click target is the existing rename <input> in the name cell, and it later focuses #chatinput. This breaks the unnamed-user rename interaction.

Issue Context

Unnamed users are rendered with an <input> in the .usertdname cell and are wired up via #otheruserstable input.newinput.

Fix Focus Areas

src/static/js/pad_userlist.ts[373-410]
src/static/js/pad_userlist.ts[183-196]

Suggested change

Add early-return guards before doing any prefill/show work, for example:

Return if $(event.target).closest('input, textarea, select, button, a, [contenteditable=true]').length.
Or at minimum return if $(event.target).closest('.usertdname input').length. This keeps the row-click behavior while preserving rename semantics.

PR 7647 (2026-05-01)

[observability] Hardcoded 5s socket wait

Hardcoded 5s socket wait

waitForSocketEvent now hardcodes a 5s timeout for all socket events, which can cause suites that rely on Mocha’s default per-test timeout to fail with a generic Mocha timeout instead of the helper’s descriptive error. It also makes failing runs wait longer before surfacing the root cause.

Issue description

waitForSocketEvent() uses a hardcoded 5000ms timeout for all socket waits. In suites that don’t increase Mocha’s default per-test timeout, failures may surface as a generic Mocha timeout instead of waitForSocketEvent’s explicit timed out waiting for <event> error.

Issue Context

Some callers (e.g., connect/handshake paths) legitimately need a longer timeout on slow CI runners, but other call sites benefit from failing fast and producing a clear error.

Fix Focus Areas

Add an optional timeoutMs parameter to waitForSocketEvent(socket, event, timeoutMs?), and use it in setTimeout(..., timeoutMs).
Update slow paths (connect(), handshake(), and any other known-slow call sites) to pass 5000 explicitly.
Keep a shorter default (or ensure suites that rely on defaults set this.timeout(...) high enough) to avoid Mocha masking the helper’s error.
src/tests/backend/common.ts[114-219]
src/tests/backend/specs/messages.ts[12-60]

[reliability] SessionStore waits still tight

SessionStore waits still tight

SessionStore expiry tests still rely on fixed sleeps with only ~30ms headroom (e.g., expires in 300ms, then sleep 330ms), so timer jitter or event-loop delays can still cause intermittent failures under heavy CI load. This PR improves the margin vs. before, but it doesn’t eliminate the underlying flake pattern.

Issue description

Several SessionStore tests use fixed setTimeout sleeps and then assert the DB record is gone/present. Even with the increased windows, the assertions can still race the actual cleanup work if the event loop is delayed.

Issue Context

SessionStore schedules expiration cleanup with setTimeout(...) and documents races on slow systems. Tests that assume cleanup has run at an exact time remain inherently timing-fragile.

Fix Focus Areas

Replace fixed sleeps like await new Promise(r => setTimeout(r, 330)) + assert with a small polling helper (e.g., poll every 25ms up to a 2–5s max) that waits until the DB condition is met.
Keep the expiry durations modest, but remove tight coupling between “sleep duration” and “expiry duration”.
src/tests/backend/specs/SessionStore.ts[47-168]

PR 7645 (2026-05-01)

[reliability] Concurrency blocks PR builds

Concurrency blocks PR builds

With the new pull_request trigger, PR docs builds share the same concurrency group ("pages") as real deployments, so only one run can execute at a time. This can delay PR feedback or delay production docs deployments whenever both are active.

Issue description

Docs builds on PRs now contend with production docs deployments because the workflow uses a single concurrency.group: "pages" for all events. This serializes PR builds and push deployments.

Issue Context

The PR adds a pull_request trigger but keeps the existing global concurrency group. Concurrency is useful for deployments but typically undesirable for PR-only builds.

Fix Focus Areas

.github/workflows/build-and-deploy-docs.yml[4-17]
.github/workflows/build-and-deploy-docs.yml[28-33]

Suggested direction

Either:

Use distinct concurrency groups per event (e.g., pages-${{ github.event_name }}), or
Split into separate jobs: a PR build job without the Pages concurrency lock and a push-only deploy job with the lock.

[maintainability] Undocumented `engines.node` bump

Undocumented `engines.node` bump

The PR raises the minimum supported Node version via `engines.node`, but related documentation still states the old requirement. This can mislead contributors/users and makes a breaking compatibility change without updating the documented guidance.

Issue description

package.json now sets engines.node to >=22.12.0, but documentation still claims the project requires >=22.0.0.

Issue Context

This is a user-visible compatibility/requirements change and should be reflected anywhere the Node requirement is documented.

Fix Focus Areas

doc/npm-trusted-publishing.md[88-91]
package.json[45-45]

PR 7644 (2026-05-01)

[security] Unvalidated plugin names

Unvalidated plugin names

`update()` installs every name from `var/installed_plugins.json` (except `ep_etherpad-lite`) without enforcing the `ep_` prefix. If that file is corrupted or modified, running `plugins update` can install arbitrary packages, unlike `checkForMigration()` which explicitly restricts to `ep_` plugins.

Issue description

bin/plugins.ts updates plugins by trusting var/installed_plugins.json and installing every entry by name. This should be restricted to actual Etherpad plugins (ep_ prefix) to prevent accidental or malicious installation of arbitrary packages.

Issue Context

checkForMigration() already enforces plugin.name.startsWith(plugins.prefix) before installing from installed_plugins.json, but update() does not.

Fix Focus Areas

Add an explicit ep_/plugins.prefix validation filter before invoking installPlugin().
Consider de-duplicating names (e.g., via new Set(names)) to avoid repeated installs if the file contains duplicates.

Fix Focus Areas (code locations)

bin/plugins.ts[81-112]
src/static/js/pluginfw/installer.ts[81-134]
src/static/js/pluginfw/plugins.ts[36-154]

General

Resources

For Developers

How to's

Set up

Advanced steps

Integrating Etherpad in your web app

Embedding a pad using the jQuery plugin
Using Embed Parameters
Integrating Etherpad in a third party app (Drupal, MediaWiki, WordPress, Atlassian, PmWiki)
HTTP API client libraries

Uh oh!

.pr_agent_accepted_suggestions

Issue description

Issue Context

Fix Focus Areas

Suggested implementation direction

Issue description

Issue Context

Fix Focus Areas

Issue description

Issue Context

Fix Focus Areas

Issue description

Issue Context

Fix Focus Areas

Implementation notes

Issue description

Issue Context

Fix Focus Areas

Recommended fix

Alternative fix

Issue description

Issue Context

Fix Focus Areas

Suggested fix

Issue description

Issue Context

Fix Focus Areas

Suggested approach

Issue description

Issue Context

Fix Focus Areas

Suggested approach

Issue description

Issue Context

Fix Focus Areas

Issue description

Issue Context

Fix Focus Areas

Implementation notes

Issue description

Issue Context

Fix Focus Areas

Implementation notes

Issue description

Issue Context

Fix Focus Areas

Issue description

Issue Context

Fix Focus Areas

Suggested fix

Issue description

Issue Context

Fix Focus Areas

Suggested implementation direction

Issue description

Issue Context

Fix Focus Areas

Suggested implementation direction

Issue description

Issue Context

Fix Focus Areas

Suggested implementation direction

Issue description

Issue Context

Fix Focus Areas

Issue description

Issue Context

Fix Focus Areas

What to change

Issue description

Issue Context

Fix Focus Areas

Suggested fix options

Issue description

Issue Context

Fix Focus Areas

Suggested fix

Issue description

Issue Context