-
-
Notifications
You must be signed in to change notification settings - Fork 3k
.pr_agent_accepted_suggestions
| PR 7794 (2026-05-17) |
[reliability] Brittle path stripping
Brittle path stripping
`seen` is derived by stripping `${srcRoot}${sep}` via `startsWith()` and then replacing only the current platform separator, so if Mocha prints paths with different separators or drive-letter casing on Windows, the prefix will not be removed and the test will fail with absolute paths in `seen`. This undermines the PR’s goal of making the assertions stable across Linux/Windows runners.Issue description
The test normalizes Mocha’s --list-files output by doing a string startsWith(prefix) check and then replacing only path.sep. This can fail on Windows if the output uses different separators (e.g., /) or if drive-letter casing differs, leaving absolute paths that break the toContain() assertions.
Issue Context
The logic currently does:
- build
prefix =${srcRoot}${sep}` - strip prefix with
startsWith(prefix) - convert separators with
split(sep).join('/')
Fix Focus Areas
- src/tests/backend-new/specs/backend-tests-glob.test.ts[51-56]
Suggested implementation direction
- Normalize each line with
path.normalize(l). - Convert to repo-relative using
path.relative(srcRoot, normalized)when the path is absolute. - Finally convert to POSIX separators (e.g.,
rel.split(path.sep).join('/')). This avoids depending on exact string prefix/casing/separator matches.
| PR 7793 (2026-05-17) |
[maintainability] `listAuthorsOfPad` filtering undocumented
`listAuthorsOfPad` filtering undocumented
`listAuthorsOfPad` now filters out the synthetic `Pad.SYSTEM_AUTHOR_ID` from the returned `authorIDs`, changing public HTTP API behavior. The HTTP API documentation under `doc/api/http_api.*` is not updated to reflect this, risking client confusion and incorrect integrations.listAuthorsOfPad behavior changed to exclude the synthetic Pad.SYSTEM_AUTHOR_ID (a.etherpad-system), but the public HTTP API documentation still states it returns authors who contributed to the pad without noting this exclusion.
This PR intentionally hides the system author from the public API surface by filtering it out in src/node/db/API.ts. Documentation in doc/ should be updated in the same PR to match the new behavior.
- doc/api/http_api.md[696-704]
- doc/api/http_api.adoc[652-661]
| PR 7792 (2026-05-17) |
[reliability] Non-`apierror` export errors unhandled
Non-`apierror` export errors unhandled
The export route only converts `apierror` exceptions into a deterministic plain-text response; all other export errors are passed to `next()` and can still fall back to Express's default HTML error page. This can reintroduce non-deterministic HTML error bodies for other export failures.The export route only special-cases err.name === 'apierror' and calls next(err) for everything else, which can still trigger Express's default HTML error renderer.
Compliance requires explicit error handling for export-route failures so clients/tests get deterministic, non-HTML error bodies (at minimum containing err.message) rather than Express's default HTML page.
- src/node/hooks/express/importexport.ts[74-82]
[correctness] Error downloads as attachment
Error downloads as attachment
If `checkValidRev()` throws, `ExportHandler.doExport()` has already set `Content-Disposition: attachment`, so the new plain-text 500 body can still be delivered as a downloadable file rather than a visible error. This makes the surfaced error message harder to notice and can confuse callers about whether an export succeeded.ExportHandler.doExport() calls res.attachment(...) before validating req.params.rev. If checkValidRev() throws, the new apierror catch returns a plain-text 500 but keeps Content-Disposition: attachment, so clients (especially browsers) may download the error body as an export file.
-
res.attachment()is invoked beforecheckValidRev(). - The route-level catch sends the error message but does not clear
Content-Disposition.
- src/node/hooks/express/importexport.ts[74-82]
- src/node/handler/ExportHandler.ts[58-65]
Prefer one of:
- Move rev validation (
checkValidRev) beforeres.attachment(...)inExportHandler.doExport()so invalid:revnever sets download headers. - Or, in the
apierrorcatch branch, callres.removeHeader('Content-Disposition')(and optionally clear any previously-set exportContent-Type) before sending the plain-text 500.
| PR 7777 (2026-05-16) |
[correctness] History toolbar i18n regression
History toolbar i18n regression
#history-controls now uses aria-labelledby, which overrides the aria-label that pad_mode.ts sets from the already-translated pad.historyMode.controlsLabel key. Because the new pad.editor.toolbar.history key currently exists only in en.json, non-English locales will likely announce the English fallback instead of the existing localized label.#history-controls previously got a localized accessible name via pad_mode.ts setting aria-label from pad.historyMode.controlsLabel. This PR adds aria-labelledby="editbar-history-label", which takes precedence over aria-label and points to a new data-l10n-id="pad.editor.toolbar.history" span.
In the current repo state, pad.editor.toolbar.history exists only in en.json, while pad.historyMode.controlsLabel is already translated in many locales. This means non-English locales will regress from localized naming to an English fallback.
-
aria-labelledbyoverridesaria-labelin accessible name computation. - The repo already contains translations for
pad.historyMode.controlsLabelin multiple locale files. - The new history-toolbar key is currently only present in English.
- src/templates/pad.html[87-94]
- src/templates/pad.html[116-117]
- src/static/js/pad_mode.ts[97-120]
- src/locales/en.json[365-368]
Prefer reusing the already-translated history controls label key:
- Change
#editbar-history-labelto usedata-l10n-id="pad.historyMode.controlsLabel"(and update its fallback text accordingly, e.g. "Pad history controls"). - Keep
aria-labelledby="editbar-history-label"on#history-controls. - Update the Playwright expectation for the history toolbar label to match the new (reused) string.
If you need the specific new phrasing, add pad.editor.toolbar.history to all locale JSON files (or ensure your translation import pipeline will populate them at the same time as this change).
| PR 7773 (2026-05-16) |
[correctness] Wrong author in roSocket
Wrong author in roSocket
The USER_CHANGES test helper always builds the apool using `authorId` from the first socket’s CLIENT_VARS, but `roSocket` can have a different author identity because `common.connect()` only forwards cookies from the *current* HTTP response and `/p/:pad` does not re-issue the author-token cookie if it already exists. With the new server validation requiring an `author` attribute on '+' ops, `sendUserChanges(roSocket, ...)` can now be rejected as “changes as another author”, breaking the `permitOnce` test.src/tests/backend/specs/messages.ts defaults the USER_CHANGES apool to the main socket’s authorId for all sockets. But roSocket can end up with a different authorId (no token cookie forwarded on its socket.io connection), so sendUserChanges(roSocket, '...*0+...') may be rejected by the new server-side validation.
-
authorIdis captured only from the first socket’s CLIENT_VARS; the roSocket CLIENT_VARS is ignored. -
common.connect(res)forwards only cookies present in the provided HTTP response’sset-cookieheaders. -
/p/:padusesensureAuthorTokenCookie(), which does not set a Set-Cookie header if the request already has a valid token cookie—so the second GET (to the read-only URL) often won’t provide the token cookie tocommon.connect().
- src/tests/backend/specs/messages.ts[26-46]
- src/tests/backend/specs/messages.ts[171-179]
- src/tests/backend/specs/messages.ts[255-270]
- Capture
roAuthorIdfromcommon.handshake(roSocket, roPadId)(read CLIENT_VARS). - Change
authorPool()to accept an explicit authorId (or create two pools: one forsocket, one forroSocket). - Ensure
sendUserChanges(roSocket, ...)usesroAuthorId’s pool (or explicitly pass anapoolargument in the roSocket test).
| PR 7771 (2026-05-16) |
[reliability] Restart triggered on failure
Restart triggered on failure
installer.install() now calls its wrapped callback with an error, which decrements the global task counter and triggers onAllTasksFinished() even for failures, causing hooks.aCallAll('restartServer') to run. This can restart Etherpad (disconnecting users) on any failed install attempt, including the intentional EngineIncompatibleError short-circuit where nothing was installed.wrapTaskCb() triggers onAllTasksFinished() whenever the task counter reaches zero, regardless of whether the task completed successfully. With this PR, install() now invokes cb(err) on failures, so failed installs (including preflight incompatibility) can cause a full server restart even though nothing changed.
-
wrapTaskCb()decrementstasksafter invoking the callback and callsonAllTasksFinished()whentasks === 0. -
onAllTasksFinished()callshooks.aCallAll('restartServer'), which closes and recreates the HTTP(S) server. - After this PR,
install()callscb(err)in the catch block, so failures now complete the task and can trigger the restart path.
- src/static/js/pluginfw/installer.ts[34-58]
- src/static/js/pluginfw/installer.ts[182-199]
- Track whether a task actually made changes (e.g.,
needsRestart/didMutateflag set only after successful install/uninstall steps), and only callonAllTasksFinished()whentasks === 0 && needsRestart. - For
install(), setneedsRestart = trueonly afterlinkInstaller.installPlugin()andhooks.aCallAll('pluginInstall', ...)succeed. - For the engine-incompatibility short-circuit (and other errors), ensure the task counter still decrements, but do not restart.
- Alternatively, change the wrapper to pass through error state and skip
onAllTasksFinished()if the first callback argument is truthy, while still decrementingtasksin afinally-like manner.
[reliability] Registry fetch can hang
Registry fetch can hang
fetchPluginEnginesNode() awaits a registry.npmjs.org fetch with no timeout/abort, and install() awaits it before proceeding. A stalled DNS/network connection can therefore hang the entire install path and prevent the finished:install socket event from ever being emitted.fetchPluginEnginesNode() performs a network fetch without any timeout or abort mechanism. If the request stalls (DNS, proxy, captive portal, registry hang), install() will await forever and the admin UI will never receive finished:install.
The comment says the lookup is best-effort and should fall through on failure, but an unbounded await is not a failure and can block indefinitely.
- src/static/js/pluginfw/installer.ts[164-190]
- Use an
AbortControllerwith a short timeout (e.g., 3–10 seconds): - Create controller,
setTimeout(() => controller.abort(), timeoutMs) - Pass
{headers, signal: controller.signal}tofetch() - In
finally, clear the timeout - On abort (or any error), return
undefinedso install continues down the existing path. - Optionally log at debug level when the preflight times out to aid diagnosis without spamming logs.
| PR 7762 (2026-05-15) |
[maintainability] New Prometheus metrics lack flag
New Prometheus metrics lack flag
The PR introduces new Prometheus metrics (`etherpad_pad_users`, `etherpad_changeset_apply_duration_seconds`, `etherpad_socket_emits_total`) that are registered and emitted unconditionally. This violates the requirement that new features be gated behind a feature flag and disabled by default.New Prometheus metrics are always enabled, but compliance requires new features to be behind a feature flag and disabled by default.
This PR registers three new metrics and emits them on the hot path and in the Prometheus monitor loop without any enable/disable mechanism.
- src/node/prometheus.ts[26-48]
- src/node/handler/PadMessageHandler.ts[50-50]
- src/node/handler/PadMessageHandler.ts[609-609]
- src/node/handler/PadMessageHandler.ts[630-630]
- src/node/handler/PadMessageHandler.ts[671-671]
- src/node/handler/PadMessageHandler.ts[791-791]
- src/node/handler/PadMessageHandler.ts[904-904]
- src/node/handler/PadMessageHandler.ts[957-957]
[correctness] Histogram includes fan-out
Histogram includes fan-out
handleUserChanges() starts etherpad_changeset_apply_duration_seconds before any processing and only stops it in finally, after updatePadClients() completes, so the histogram includes broadcast/fan-out time rather than isolating the apply path. This prevents the scaling-dive analysis from distinguishing “apply is slow” vs “fan-out is slow”.etherpad_changeset_apply_duration_seconds is intended to time the apply path, but the timer currently spans the entire handleUserChanges() including await exports.updatePadClients(pad) (fan-out). This makes the metric misleading and defeats the PR’s stated purpose.
- The timer is started near the beginning of
handleUserChanges()and stopped infinally. -
updatePadClients()is awaited before the timer is stopped, so its work is included in the duration.
- src/node/handler/PadMessageHandler.ts[788-905]
- Move
recordChangesetApply()start to immediately before the “apply” work you want to measure (e.g., just beforepad.appendRevision(...)). - Call the returned
stopHistogram()immediately after the apply work completes and before any socket emits /updatePadClients()fan-out. - If you also want total end-to-end latency, keep using the existing
stats.timer('edits')(or add a second histogram explicitly for fan-out/total).
[reliability] Unbounded metrics label
Unbounded metrics label
handleCustomMessage() increments etherpad_socket_emits_total with msgString (an HTTP API parameter) as the label value, allowing an API caller to generate unbounded distinct label values and potentially exhaust memory/CPU via high-cardinality time series. Although the endpoint is API-key protected, this still creates a sharp footgun if the key is leaked or a client misbehaves.etherpad_socket_emits_total{type=...} uses the message type as a Prometheus label. In handleCustomMessage(), that type comes directly from the HTTP API msg parameter, which means label cardinality is unbounded.
-
sendClientsMessage(padID, msg)passesmsgdirectly intoPadMessageHandler.handleCustomMessage(padID, msg). -
handleCustomMessage()setsdata.type = msgStringand then callsrecordSocketEmit(msg.data.type). -
recordSocketEmit()usessocketEmitsTotal.labels(type).inc(), creating a new time series per distincttype.
- src/node/handler/PadMessageHandler.ts[620-631]
- src/node/db/API.ts[863-866]
- src/node/prom-instruments.ts[17-21]
- src/node/prom-instruments.ts[34-36]
Pick one of:
-
Normalize/bucket: Map any unknown/untrusted type to a small bounded set (e.g.,
API_CUSTOM_MESSAGE,CUSTOM,other,unknown). -
Allowlist: Only allow a fixed set of label values (e.g.,
NEW_CHANGES,CHAT_MESSAGE, etc.); everything else becomesother. -
Remove label at this site: For
handleCustomMessage(), increment with a constant label regardless ofmsgString. Also consider enforcing a max length/character set if you keep any user-provided label value.
| PR 7757 (2026-05-15) |
[correctness] Promise sent to admin
Promise sent to admin
In the admin plugin socket handler, `checkUpdates` emits the unresolved Promise returned by `checkPluginForUpdates()` because it is not awaited, so the admin UI will throw when it calls `.includes()` on `data.updatable`. Additionally, the error path emits `{}` for `updatable`, which also violates the UI's `string[]` expectation and can throw on transient errors.The checkUpdates socket event emits an unresolved Promise (missing await) and emits the wrong type ({}) on error. This breaks the admin plugins page because the client expects updatable to be a string[] and calls .includes().
-
checkPluginForUpdatesisasync. - Admin UI (
HomePage.tsx) assumesupdatable: string[].
- src/node/hooks/express/adminplugins.ts[74-82]
- admin/src/pages/HomePage.tsx[66-70]
| PR 7755 (2026-05-15) |
[security] Unsafe cookie decoding
Unsafe cookie decoding
PadMessageHandler.handleMessage() decodes untrusted handshake Cookie values with decodeURIComponent() without handling URIError, so a malformed cookie (e.g. sessionID=%ZZ) will throw and abort CLIENT_READY processing for that socket. This enables repeated connection attempts to spam server error logs and prevent clients from joining pads.readCookie() uses decodeURIComponent() on raw Cookie header values without error handling. If the cookie value contains malformed percent-encoding, decodeURIComponent() throws a URIError, causing CLIENT_READY to fail and generating server error logs.
Cookie headers are client-controlled; malicious or buggy clients can supply invalid encodings. The error propagates up to the socket message handler, which logs the exception.
- src/node/handler/PadMessageHandler.ts[406-434]
- Wrap
decodeURIComponent(...)in atry/catchand treat decode failures asnull(or fall back to the raw string). - Consider rate-limiting or lowering log severity for repeated decode failures to reduce log amplification.
- Add a backend regression test that connects with a malformed cookie value (for
sessionIDand/ortoken) and assertsCLIENT_READYdoes not throw andauth.sessionIDresolves tonull(or expected fallback).
| PR 7753 (2026-05-15) |
[correctness] Misleading deferred subtitle
Misleading deferred subtitle
`UpdatePage` renders the "Outside maintenance window" subtitle for autonomous tier whenever `scheduledFor` is more than 60 seconds away, which is also true for a normal in-window grace period (e.g., 15 minutes), so the UI can incorrectly claim the delay is due to being outside the window.UpdatePage currently shows the maintenance-window deferral subtitle based on a fixed scheduledFor > now + 60s heuristic. This misfires when the update is scheduled inside the maintenance window but has a grace period longer than 60s (common per docs), leading to incorrect admin messaging.
Backend Tier 4 only snaps scheduledFor to the next window opening if now + grace is outside the window; otherwise it keeps scheduledFor = now + grace. The status endpoint also always provides nextWindowOpensAt for autonomous tier (when the window parses), so the UI needs a stronger signal than “scheduledFor is far away”.
- admin/src/pages/UpdatePage.tsx[186-207]
- src/node/updater/Scheduler.ts[79-88]
- src/node/hooks/express/updateStatus.ts[115-119]
- doc/admin/updates.md[202-208]
Change the subtitle condition to reflect actual deferral, e.g. only show it when scheduled.scheduledFor is effectively the next window opening (compare equality to us.nextWindowOpensAt, possibly with a small tolerance), rather than using a fixed > now + 60s threshold.
[correctness] Preflight detail dropped
Preflight detail dropped
`applyUpdate()` saves a detailed preflight failure reason (`reasonStr`) to state and logs but returns only `pf.reason`, causing `/admin/update/apply` responses and failure-notification emails to lose the detail that was just computed.On preflight failure, applyUpdate() computes reasonStr (including optional pf.detail) and persists it, but returns {reason: pf.reason}. Callers (HTTP route + notify path) use the returned result.reason, so they miss the detail string.
- State/logs store the enriched string, so the admin UI eventually shows it.
- Immediate HTTP 409 responses and
notifyApplyFailure()emails use the returned value and therefore lose important diagnostics (e.g., Node engine mismatch details).
- src/node/updater/applyPipeline.ts[89-103]
- src/node/hooks/express/updateActions.ts[274-301]
- src/node/updater/Notifier.ts[156-162]
- Change
applyUpdate()to returnreason: reasonStrforpreflight-failed. - Ensure any downstream callers that surface
result.reason(HTTP + email) now get the enriched detail.
[reliability] SMTP cache misses config
SMTP cache misses config
`sendEmailViaSmtp()` only rebuilds the cached nodemailer transport when `settings.mail.host` changes, so runtime changes to `port`/`secure`/`auth` after `reloadSettings()` will keep using a stale transport configuration.The cached nodemailer transport is invalidated only on host changes. If an operator updates SMTP credentials or port (without changing host) and triggers reloadSettings(), future emails will continue using the old transport settings.
The admin settings flow can call reloadSettings() at runtime, so mail settings are expected to be mutable without a process restart.
- src/node/updater/index.ts[46-78]
- src/node/hooks/express/adminsettings.ts[395-401]
- Expand the cache key to include
host,port,secure, and a stable representation ofauth(and optionallyfrom). - Alternatively, clear
transportCachewheneverreloadSettings()runs (if there is a suitable hook point), forcing a rebuild on next send.
| PR 7750 (2026-05-15) |
[reliability] No test for plugin_packages migration
No test for plugin_packages migration
This PR changes Debian postinstall/systemd behavior for `src/plugin_packages` but does not add/update an automated regression test to validate the new directory layout and upgrade migration. Without a test, future packaging changes could reintroduce the `MODULE_NOT_FOUND` failure or break upgrades unnoticed.The PR fixes a Debian packaging bug by keeping src/plugin_packages in-tree and adjusting permissions/systemd sandboxing, but there is no updated regression test to enforce the new behavior.
packaging/test-local.sh currently asserts the old symlink-based layout for /opt/etherpad/src/plugin_packages, so it will not validate (and will likely fail under) the new intended behavior.
- packaging/test-local.sh[133-150]
- packaging/scripts/postinstall.sh[69-103]
- packaging/systemd/etherpad.service[43-51]
[correctness] CI symlink assertions stale
CI symlink assertions stale
packaging/scripts/postinstall.sh no longer creates /opt/etherpad/src/plugin_packages as a symlink, but the Debian CI workflow and packaging/test-local.sh still assert it is a symlink to /var/lib/etherpad/plugin_packages, causing CI/test failures.postinstall.sh now migrates away from a symlinked src/plugin_packages and ensures it is a real directory. CI and local packaging tests still check for the old symlink layout, so they will fail.
The deb packaging validation in .github/workflows/deb-package.yml and packaging/test-local.sh must be updated to match the new in-tree directory layout and permissions.
- .github/workflows/deb-package.yml[170-179]
- packaging/test-local.sh[133-146]
- Replace
test -L /opt/etherpad/src/plugin_packages+readlinkassertions withtest -d /opt/etherpad/src/plugin_packages. - Replace
/var/lib/etherpad/plugin_packagesexistence/ownership checks with permission/group checks on/opt/etherpad/src/plugin_packages(and optionally/opt/etherpad/src/plugin_packages/.versionsif you decide to pre-create it), consistent withpostinstall.sh(group=etherpad, mode 2775).
| PR 7748 (2026-05-15) |
[reliability] Mocha --exit masks handle leaks
Mocha --exit masks handle leaks
Adding `--exit` forces the test process to terminate even if timers/sockets/handles are still open, so backend tests will no longer hang or otherwise surface leaked handles during the post-suite event-loop drain. This weakens regression detection for known classes of leaks (for example, socket.io-client reconnect timers) and can let new leaks slip by silently.mocha --exit improves flake mitigation, but it also prevents the test runner from naturally exposing leaked handles (timers/sockets) via a non-draining event loop.
At least one backend spec explicitly calls out that leaked socket.io-client reconnect timers keep the event loop alive past Mocha’s “passing” output; --exit would make similar future leaks much harder to detect.
- src/package.json[149-152]
- src/tests/backend/specs/lowerCasePadIds.ts[20-26]
- Make
--exitconditional (e.g., only in CI, or only for Windows+Node24) via an env var in the workflow and a small script wrapper. - If
--exitmust remain on, add an explicit leak-detection phase (e.g., fail the run if known handle types remain, or run an open-handle dump right before forced exit) so regressions are still caught.
[observability] Diagnostics exit matrix misleading
Diagnostics exit matrix misleading
Backend tests now run Mocha with `--exit`, so successful runs will typically skip `beforeExit` and always emit only `exit`, but `tests/backend/diagnostics.ts` documents “only exit” as meaning `process.exit()` was called unexpectedly. This makes the diagnostics output/documentation inaccurate and can mislead future debugging of backend-test failures.Mocha is now invoked with --exit, which changes the meaning of the beforeExit/exit event pattern that src/tests/backend/diagnostics.ts documents and uses for interpretation.
With --exit, Mocha calls process.exit(...) at the end of the run. That means beforeExit usually won’t run on success, and “only exit” becomes the expected behavior, contradicting diagnostics.ts’s current matrix.
- src/tests/backend/diagnostics.ts[23-27]
- src/tests/backend/diagnostics.ts[66-73]
- src/package.json[149-152]
- Update the comment matrix (and optionally the log output) to explicitly account for
--exit. - Optionally, have diagnostics.ts detect
--exit(e.g., viaprocess.argv.includes('--exit')) and log an extra line likediag('mocha --exit enabled; beforeExit will not fire on success')so the runner log remains self-explanatory.
| PR 7727 (2026-05-11) |
[reliability] `CMD` change lacks regression test
`CMD` change lacks regression test
The PR changes the container startup path to bypass `pnpm run prod`, but it does not add a regression test that would fail if the old `pnpm`-based `CMD` were restored. Without an automated reproduction (e.g., the `docker-compose.yml` named-volume layout on `src/plugin_packages`), this bug can silently regress.This PR fixes a Docker runtime crash by bypassing pnpm run prod, but it does not include a regression test to ensure the container still boots under the problematic volume layout (notably the named volume mounted to /opt/etherpad-lite/src/plugin_packages).
The changed CMD in Dockerfile is intended to avoid pnpm 11's runDepsStatusCheck behavior at runtime. A regression test should exercise the same startup conditions (ideally including the docker-compose.yml named-volume layout) so reverting the fix would cause CI to fail.
- Dockerfile[217-228]
- .github/workflows/docker.yml[66-82]
- docker-compose.yml[2-30]
| PR 7726 (2026-05-11) |
[observability] Missing nginx logs on timeout
Missing nginx logs on timeout
On timeout the step only prints `etherpad-docker` logs, but the polled endpoint is the nginx container on 8081 and that container is started without a name. If nginx is the problem, the workflow won’t capture its logs, slowing triage.The timeout handler dumps only etherpad logs, but failures at http://127.0.0.1:8081/ can be caused by nginx itself (crash, bad config, network). Because nginx is started without --name, it’s awkward to reliably capture its logs during failure.
Nginx is the container bound to host port 8081 and proxies to etherpad.
- .github/workflows/rate-limit.yml[58-84]
- Start nginx with a stable name, e.g.
docker run --name nginx-docker -p 8081:80 ... -d nginx-latest. - On readiness timeout, print both:
docker logs nginx-docker || true-
docker logs etherpad-docker || trueOptionally adddocker ps -afor quick container state visibility.
[performance] Wait can reach 4m
Wait can reach 4m
The readiness poll can delay job failure up to ~240s because each of the 60 iterations includes a curl with `--max-time 2` plus `sleep 2`. This unnecessarily prolongs CI runs when etherpad never becomes ready.The readiness loop’s wall-clock upper bound is ~4 minutes (60 * (2s curl timeout + 2s sleep)), which can slow down failure feedback.
This is the new “Wait for etherpad behind nginx to be ready” step.
- .github/workflows/rate-limit.yml[66-84]
Use a single wall-clock timeout (e.g., timeout 120s bash -c 'until curl ...; do sleep 2; done') or reduce either the per-try curl timeout or the sleep interval so the maximum wait matches your intended bound (for example, 120s total). Also consider printing the final HTTP status/connection error to make timeouts easier to interpret.
| PR 7720 (2026-05-11) |
[reliability] Timer skips state recheck
Timer skips state recheck
The scheduler timer’s fire path calls applyUpdate() without re-checking persisted execution state or current policy, so an update can start even if the admin cancels at the timer boundary or the policy/tier changes during the grace window. This contradicts the SchedulerRunnerDeps contract comment that triggerApply should re-check state before acting.The scheduler timer fire path (triggerApply) does not verify that:
- the persisted state is still
scheduledfor the same tag, and - policy still allows auto-apply. This can cause updates to start despite last-moment cancellation or a tier/policy flip during the grace window.
-
SchedulerRunnerDeps.triggerApplyis documented to re-check persisted state. - The production wiring in
index.tscurrently callsapplyUpdate()directly. -
applyUpdate()validates state shape/tag/lock, but it does not evaluate update policy.
- src/node/updater/Scheduler.ts[96-101]
- src/node/updater/index.ts[320-331]
- src/node/updater/applyPipeline.ts[48-77]
In triggerApply (index.ts):
- Load the latest persisted state.
- Abort unless
state.execution.status === 'scheduled'ANDstate.execution.targetTag === targetTag. - Re-evaluate policy (using current settings + install method + current version) and abort unless
policy.canAuto. - Optionally, if aborting due to policy denial, clear scheduled state to
idleand persist it so the UI doesn’t show a stale scheduled state. This keeps the timer callback safe under races and matches the documented contract.
[reliability] scheduledFor format unchecked
scheduledFor format unchecked
State validation for the new scheduled execution status only enforces that scheduledFor is a non-empty string, not a valid timestamp, so a hand-edited state file can pass validation but result in NaN delay calculations and effectively immediate timer firing. This is low-likelihood but easy to harden given scheduled status is newly introduced.execution.status === 'scheduled' requires scheduledFor and startedAt, but state validation only checks they are non-empty strings, not valid timestamps.
- Scheduler delay uses
new Date(scheduledFor).getTime(). - Invalid date strings yield
NaN, which can lead to unintended immediate scheduling behavior.
- src/node/updater/state.ts[15-39]
- Enhance execution validation for statuses that store timestamps (including
scheduled) to verify fields are parseable dates (e.g.,!Number.isNaN(Date.parse(value))). - Apply the stricter check only to known timestamp fields (scheduledFor/startedAt/drainEndsAt/etc.) to avoid changing semantics for non-timestamp strings like
reason. This is defense-in-depth against corrupted/hand-edited state.
| PR 7716 (2026-05-10) |
[security] Hardcoded `https://www.npmjs.com`
Hardcoded `https://www.npmjs.com`
The new admin UI link uses a protocol-specific external URL (`https://...`) instead of a protocol-independent URL. This violates the checklist requirement to use protocol-independent URLs where applicable to avoid protocol-coupling issues.A new external link hard-codes https:// (https://www.npmjs.com/...) instead of using a protocol-independent URL as required by the compliance checklist.
This URL is used for the npm search link in the admin plugins page.
- admin/src/pages/HomePage.tsx[138-145]
[reliability] Plugin listeners not cleaned
Plugin listeners not cleaned
HomePage registers multiple Socket.IO handlers inside effects that re-run (e.g., when `searchParams` or the socket instance changes) but only clears an interval on unmount, so listeners are never detached. This allows handlers to fire after unmount and to accumulate over time, causing duplicate state updates and unbounded listener growth.HomePage attaches Socket.IO listeners (e.g., results:installed, results:updatable, results:search, results:searcherror, finished:*, connect) inside React effects, including effects that re-run when searchParams changes, but it does not unregister those listeners in cleanup. Update the component so every registered .on() handler is removed appropriately to prevent duplicate event handling, unbounded listener growth, and handlers firing after unmount.
The current effect cleanup only clears a setInterval, leaving all Socket.IO .on() handlers attached. Because Socket.IO listeners persist until explicitly removed, any effect that re-runs (such as one dependent on searchParams, or one that might re-run if the socket instance changes) will accumulate additional listeners unless off(event, handler) is called, leading to multiple state updates per single server response.
- admin/src/pages/HomePage.tsx[58-89]
- admin/src/pages/HomePage.tsx[91-100]
| PR 7714 (2026-05-10) |
[correctness] REST checkToken path changed
REST checkToken path changed
`checkToken` was moved into the new `server` resource, which changes its REST-style runtime route from `/rest//pad/checkToken` to `/rest//server/checkToken`. This breaks REST-style backward compatibility (clients will get `code: 3` “no such function”) and contradicts the design note that REST URLs for existing operations should remain unchanged via per-op tag overrides.Issue description
checkToken now routes under /rest/<ver>/server/checkToken, breaking existing REST-style callers that previously used /rest/<ver>/pad/checkToken.
Issue Context
REST-style paths are computed from the resource/action keys (_restPath = /${resource}/${action}) and used by the runtime OpenAPIBackend router for /rest/`.
Fix Focus Areas
- src/node/hooks/express/openapi.ts[156-319]
Suggested fix
- Move
checkTokenback under the existingpadresource (restoring/pad/checkTokenrouting), and settags: ['server']on that operation to achieve the desired grouping without changing the REST URL. - Keep the
serverresource for truly new server-only operations likegetStats. - (Optional safety) Add a regression test that
/rest/<ver>/pad/checkTokenstill resolves (similar to the existing/api/<ver>/checkTokenassertions).
| PR 7710 (2026-05-09) |
[security] Hardcoded https Google Fonts
Hardcoded https Google Fonts
External font resources are added using hardcoded `https://` URLs instead of protocol-independent `//` URLs. This can cause mixed-content/compatibility issues in non-HTTPS deployments and violates the project convention.pad.html adds external Google Fonts resources using https:// instead of protocol-independent // URLs.
The compliance checklist requires protocol-independent URLs where appropriate to avoid mixed content issues and improve deployment compatibility.
- src/templates/pad.html[24-26]
[maintainability] Timeslider URL change undocumented
Timeslider URL change undocumented
The behavior of `/p/:pad/timeslider` has changed (302 redirect unless `?embed=1`), but no documentation updates are included. This makes existing documentation about timeslider routes inaccurate and can break adopters who rely on the documented URL.The timeslider endpoint behavior changed: /p/:pad/timeslider now redirects to the pad unless ?embed=1 is provided, but docs still imply the timeslider is directly served at /p/:padid/timeslider.
This is a public, user- and integrator-facing URL change. Documentation should mention the redirect and provide the correct URL for embedding/direct HTML (?embed=1) with an example.
- doc/skins.md[5-10]
- src/node/hooks/express/specialpages.ts[402-408]
- src/node/hooks/express/specialpages.ts[420-420]
[correctness] Latest hash opens rev0
Latest hash opens rev0
`PadModeController.onOuterHashChange()` maps `#rev/latest` to revision 0 via `rev < 0 ? 0 : rev`, which forces the embedded timeslider to jump to the first revision instead of the latest. Users manually navigating to `#rev/latest` (or any future link that sets it) will see the wrong historical content.#rev/latest is parsed as -1 but then converted to 0 when syncing the outer hash into the embedded timeslider, which incorrectly jumps to revision 0.
The embedded timeslider interprets window.location.hash = "#N" as the desired revision number.
- src/static/js/pad_mode.ts[214-226]
- src/static/js/pad_mode.ts[196-201]
- When
rev < 0, clear the inner hash (win.location.hash = "") instead of setting#0, or compute the latest revision number from the embedded frame (if available) and set that. - Ensure the outer URL remains canonical (
#rev/latestis fine), but don’t mis-target the embedded slider.
[security] External Google Fonts added
External Google Fonts added
`pad.html` now unconditionally loads multiple Google Fonts via `fonts.googleapis.com`/`fonts.gstatic.com`, adding third‑party network requests on every pad view. This can break offline/self-hosted deployments and leaks user IP/metadata to a third party by default.The core pad HTML template now hard-depends on Google Fonts, introducing third-party requests and potential breakage for offline/CSP-restricted installations.
Etherpad is commonly self-hosted; loading external fonts by default is a privacy and reliability regression.
- src/templates/pad.html[16-27]
- Remove the Google Fonts
<link>tags entirely, or gate them behind an explicit setting/skin. - If these fonts are required for a theme, bundle/self-host them under
src/static/and reference local URLs instead.
[reliability] Redirect embed bypass too loose
Redirect embed bypass too loose
The server treats any truthy `?embed=` value as an iframe request and skips the redirect, so direct visits can still access `/p/:pad/timeslider` by adding `?embed=0` or similar. Additionally, the redirect target is built as a relative path containing the padId, which can normalize unexpectedly for edge pad IDs like `..`.Issue description
/p/:pad/timeslider bypasses the redirect for any truthy embed query value and uses a relative redirect target that can normalize oddly for edge pad IDs.
Issue Context
The embed path is intended only for the in-pad iframe (?embed=1).
Fix Focus Areas
- src/node/hooks/express/specialpages.ts[230-237]
- src/node/hooks/express/specialpages.ts[401-408]
Implementation notes
- Change the condition to something strict like
if (req.query.embed !== '1') redirect.... - Prefer a simple redirect target of
..(from/p/:pad/timesliderthis resolves to/p/:pad) or an absolute/p/${encodeURIComponent(req.params.pad)}to avoid composing../${pad}. - Keep behavior consistent in both dev live-reload and production handlers.
| PR 7709 (2026-05-09) |
[correctness] Settings still textarea blob
Settings still textarea blob
The updated `/admin/settings` UI still renders the settings as a single `` value rather than a parsed, structured representation with inline comments alongside relevant keys. This fails the requirement to show settings.json in a structured, readable format with inline comments derived from JSON comments./admin/settings still uses a plain <textarea> and does not render settings as parsed/structured JSON with inline comments.
Compliance requires a structured representation that preserves formatting and surfaces inline comments derived from JSON comments.
- admin/src/pages/SettingsPage.tsx[66-76]
[maintainability] CSS rules conflict order
CSS rules conflict order
Both App.css and index.css define `.settings` styling, but `main.tsx` imports `App.tsx` (and thus App.css) before importing `index.css`, so the later `index.css` rules will override App.css for equal-specificity selectors. This can prevent the intended raw editor styling from applying consistently.App.css and index.css both style .settings, but import order makes index.css override App.css.
This can silently nullify the new styling without any runtime errors.
- admin/src/main.tsx[1-5]
- admin/src/App.tsx[1-4]
- admin/src/App.css[1-16]
- admin/src/index.css[293-306]
- Move the
App.cssimport tomain.tsxafterindex.css(and remove it fromApp.tsx), OR - Remove/rename the
.settingsrules from one file, OR - Increase selector specificity in
App.css(less preferred) so it reliably overridesindex.css.
[security] Protocol-relative wiki links
Protocol-relative wiki links
SettingsPage uses protocol-relative GitHub URLs (`//github.com/...`), which can downgrade to `http://` when the admin UI is served over HTTP and may break or redirect unexpectedly. External links should be explicit HTTPS.External links are protocol-relative and can resolve to HTTP.
- admin/src/pages/SettingsPage.tsx[112-118]
Change both hrefs to explicit https://github.com/....
[reliability] Parse error before load
Parse error before load
FormView treats `settings === undefined` as an empty string and renders ParseErrorBanner because `parseTree('')` yields no tree, even though settings simply haven’t arrived yet. App hides the loading overlay on socket connect (before the `settings` event), so users can briefly see a false parse error on page load.FormView renders a parse error when settings hasn’t loaded yet (undefined → ''), which is misleading. Because the app hides the loading overlay on socket connect (before receiving settings), the parse error can flash on initial load.
Empty string is not the same as “not yet loaded”. parseTree('') returns no tree and triggers the error banner.
- admin/src/components/settings/FormView.tsx[34-46]
- admin/src/App.tsx[48-77]
Choose one:
- In
FormView, readsettingswithout?? ''and ifsettings === undefinedreturn a loading placeholder (ornull) instead of ParseErrorBanner. - Alternatively (or additionally), in
App.tsxkeepshowLoadingtrue until thesettingsevent is received (move/duplicatesetShowLoading(false)fromconnectto thesettingshandler).
[reliability] Save toast ignores ack
Save toast ignores ack
SettingsPage.handleSave() shows a success toast immediately after emitting `saveSettings`, without waiting for any server-confirmed outcome. Because the backend emits `saveprogress: 'saved'` even if `writeFile()` throws, the UI can report a successful save even when nothing was written.The settings UI shows a "saved" toast immediately after emitting the saveSettings socket event. The server currently emits saveprogress: 'saved' even if the underlying file write fails, so the UI can incorrectly report success.
- Frontend currently does not wait for any ack to decide success/failure.
- Backend always emits
saveprogress: 'saved'regardless ofwriteFile()outcome.
- admin/src/pages/SettingsPage.tsx[53-64]
- admin/src/App.tsx[80-82]
- src/node/hooks/express/adminsettings.ts[42-50]
- Backend: on
saveSettings, emitsaveprogress: 'saved'only on success; emit a distinct failure status (and optionally an error message) on catch. - Frontend: move the success toast to be driven by the
saveprogressevent; show a failure toast onsaveprogress: 'error'(or equivalent).
[reliability] Toast wait can false-pass
Toast wait can false-pass
The Playwright `saveSettings()` helper waits for `.ToastRootSuccess` to exist, but the Toast component assigns the success/failure class based on `toastState.success` even when the toast is closed. A previously-closed success toast can therefore satisfy the selector without a new save completing, making tests flaky and potentially masking failures.The Playwright helper waits for .ToastRootSuccess, which can exist even when the toast is closed, so the helper can proceed without confirming the new save finished.
ToastDialog applies .ToastRootSuccess based on toastState.success even when toastState.open is false.
- src/tests/frontend-new/helper/adminhelper.ts[14-17]
- admin/src/utils/Toast.tsx[5-23]
Option A (tests only): Update the helper to wait for an open toast, e.g. await page.waitForSelector('.ToastRootSuccess[data-state="open"]') (or assert visibility via expect(...).toBeVisible()), and optionally wait for the previous toast to close first.
Option B (app behavior): When closing the toast (onOpenChange), also reset success (and/or use a monotonically increasing toast id) so old classes don't persist when closed.
[correctness] IconButton missing type
IconButton missing type
IconButton renders a `` without a default `type`, which defaults to `submit` inside a ``. Reusing IconButton within a form can therefore trigger unintended submissions/navigation.IconButton does not set a safe default type, so it can submit forms unexpectedly.
IconButton is a shared component used across admin pages. Adding a default type="button" prevents accidental submits while still allowing callers to override to submit when desired.
- admin/src/components/IconButton.tsx[9-13]
Set a default type while still allowing override, for example:
- Destructure
typewith default:({type = 'button', ...rest})and render<button type={type} ...> - Or render
<button type={rest.type ?? 'button'} ...>and ensure spread order does not overwrite it unintentionally.
| PR 7708 (2026-05-09) |
[reliability] Staleness check race window
Staleness check race window
runCompactStale selects stale pads in a first pass and compacts them in a later pass without re-checking getLastEdited, so a pad edited between passes can still be compacted despite becoming “fresh”. This can disrupt active users because compaction can kick sessions from the pad.runCompactStale() determines staleness in a first loop, stores IDs, and later compacts from that stored list. If a pad is edited after selection but before compaction, it can still be compacted, which undermines the “skip hot pads” goal and may kick active sessions.
Compaction can kick sessions (Cleanup.deleteRevisions()), so compacting a pad that becomes active during the run is disruptive.
- bin/compactStalePads.ts[110-180]
Before compacting each pad (ideally right before calling compactPad, after pre-flight getRevisionsCount), call getLastEdited(padId) again and skip compaction if lastEdited > cutoff (increment skippedFresh and log). This keeps the staleness guarantee closer to real-time and minimizes the race window.
[reliability] Staleness check race window
Staleness check race window
runCompactStale selects stale pads in a first pass and compacts them in a later pass without re-checking getLastEdited, so a pad edited between passes can still be compacted despite becoming “fresh”. This can disrupt active users because compaction can kick sessions from the pad.runCompactStale() determines staleness in a first loop, stores IDs, and later compacts from that stored list. If a pad is edited after selection but before compaction, it can still be compacted, which undermines the “skip hot pads” goal and may kick active sessions.
Compaction can kick sessions (Cleanup.deleteRevisions()), so compacting a pad that becomes active during the run is disruptive.
- bin/compactStalePads.ts[110-180]
Before compacting each pad (ideally right before calling compactPad, after pre-flight getRevisionsCount), call getLastEdited(padId) again and skip compaction if lastEdited > cutoff (increment skippedFresh and log). This keeps the staleness guarantee closer to real-time and minimizes the race window.
| PR 7705 (2026-05-08) |
[reliability] `/admin/openapi.json` lacks feature flag
`/admin/openapi.json` lacks feature flag
The PR enables a new `/admin/openapi.json` endpoint by default via `src/ep.json` and mounts it unconditionally in `expressPreSession`, without any feature flag or default-off setting. This violates the requirement that new features be behind a feature flag and disabled by default.A new endpoint (/admin/openapi.json) is added and enabled by default, but the compliance checklist requires new features to be behind a feature flag and disabled by default.
openapi-admin is registered in src/ep.json and mounts /admin/openapi.json in expressPreSession with no conditional check.
- src/ep.json[143-147]
- src/node/hooks/express/openapi-admin.ts[158-165]
[correctness] Wrong baseUrl for admin
Wrong baseUrl for admin
`admin/src/api/client.ts` exports a single `fetchClient` whose `baseUrl` is `/api/`, but the merged OpenAPI schema now includes admin paths like `/admin-auth/` and `/admin/update/status` that are not under `/api/`, so using the generated client for those admin endpoints will hit the wrong URL.The generated paths type now includes both public and admin endpoints, but admin/src/api/client.ts exports a single OpenAPI client configured with API_BASE_URL ("/api/"). Admin endpoints live at root (e.g. /admin-auth/), so a single fixed base URL cannot correctly target both surfaces.
admin/scripts/dump-spec.ts merges the public and admin OpenAPI docs into one before running openapi-typescript. The runtime client should therefore either:
- expose two clients/hooks (one for
/api/<version>public calls, one for root-level admin calls), or - implement a routing layer that selects the appropriate base URL per path.
- admin/src/api/client.ts[6-12]
- admin/scripts/gen-api.mjs[56-72]
- admin/scripts/dump-spec.ts[49-56]
| PR 7704 (2026-05-08) |
[security] Git tag option injection
Git tag option injection
`executeUpdate()` and `verifyReleaseTag()` pass the release tag (ultimately sourced from GitHub `tag_name`) as the first positional argument to `git checkout` / `git verify-tag` without validation or option-termination. A tag that starts with `-` can be parsed as a git option (bypassing signature verification and/or changing checkout behavior).Updater uses an externally sourced tag string as the first positional argument to git checkout and git verify-tag. If the tag begins with -, git can treat it as an option, potentially bypassing signature verification or altering checkout behavior.
The tag originates from GitHub’s releases/latest API (tag_name) and is persisted into updater state.
- src/node/updater/UpdateExecutor.ts[154-165]
- src/node/updater/trustedKeys.ts[37-66]
- src/node/hooks/express/updateActions.ts[164-276]
- src/node/updater/VersionChecker.ts[46-63]
- Reject/normalize unsafe tags before use (at minimum: disallow leading
-, whitespace, and other invalid refname characters). - Prefer passing a ref that cannot be parsed as an option, e.g.
refs/tags/<tag>. - For
git verify-tag, also add an explicit end-of-options marker:git verify-tag -- <tag-or-ref>. - For checkout, use a safe ref form (e.g.
git checkout refs/tags/<tag>) and/or validate the tag strictly (e.g. semver tags only) before invoking git.
[reliability] `/admin/update/apply` exposed without flag
`/admin/update/apply` exposed without flag
The new Tier 2 action endpoints are registered whenever `updates.tier !== "off"`, so they exist even when Tier 2 is not enabled (e.g., tier `notify`). This violates the requirement that new features be gated behind a flag and disabled by default such that the disabled path matches prior behavior (these endpoints would previously 404).Tier 2 manual update endpoints (/admin/update/apply, /cancel, /acknowledge, /log) are registered for any tier except off, which exposes new behavior even when Tier 2 is not enabled.
Compliance requires new features to be behind a feature flag and disabled by default such that the disabled path matches prior behavior (ideally 404 / route not mounted).
- src/node/hooks/express/updateActions.ts[111-120]
[reliability] Cancel races preflight
Cancel races preflight
/admin/update/cancel can set execution back to idle and remove update.lock while /admin/update/apply is still running preflight, but the apply handler does not re-check for cancellation before proceeding to drain/execute. This allows an update to proceed after cancellation and can permit a second apply to start concurrently, risking inconsistent state and repository corruption.POST /admin/update/cancel can run while /admin/update/apply is still in preflight, but cancel immediately writes execution: idle and deletes the lock file. The original apply request continues and may still start draining/executing, and the early lock release can allow concurrent apply attempts.
- Apply holds the lock across preflight + drain + executor.
- Cancel is allowed in
preflightanddraining. - Apply only reloads state right before executing, not immediately after preflight.
- src/node/hooks/express/updateActions.ts[145-214]
- src/node/hooks/express/updateActions.ts[266-289]
- src/node/updater/lock.ts[69-72]
- Do not remove the lock in
/cancelwhile an apply request might still be running. Instead: - Record a cancellation request in state (e.g.,
execution: {status: 'idle'}plus acancelRequestedAt, or introduce a dedicatedcancel-requestedexecution status). - Let the apply handler be the only place that releases the lock when it observes cancellation.
- After
runPreflight()completes (before creating the drainer), reload state and abort if execution is no longerpreflightfor the sametargetTag. - Optionally, make
releaseLock()validate ownership (PID match) before unlinking, or provide areleaseIfOwned(lockPath, pid)helper.
[reliability] Drain can wedge joins
Drain can wedge joins
SessionDrainer turns off acceptingConnections and intentionally does not restore it after the drain completes, assuming the process will exit; but if the update path throws after drain completion and the process does not exit, the server will keep disconnecting all new CLIENT_READY handshakes. This can create an outage that persists until manual restart.acceptingConnections is a module-level gate used to refuse new CLIENT_READY connections during drain. The drainer intentionally leaves it false after drain completion, assuming the process will immediately exit. If anything throws after drain completion and the process does not exit, new joins remain blocked indefinitely.
- Drain completion does not restore
acceptingConnections. -
/admin/update/applycan catch exceptions after the 202 response is sent and continue running.
- src/node/updater/SessionDrainer.ts[44-66]
- src/node/handler/PadMessageHandler.ts[380-388]
- src/node/hooks/express/updateActions.ts[204-263]
- Add an explicit API in
SessionDrainerto restore the flag (e.g.,setAcceptingConnections(true)orrestoreAcceptingConnections()), and call it in the apply handler’scatch/finallypath when the process is not going to exit. - Alternatively, set
acceptingConnections=trueupon drain completion, and introduce a separate guard tied to persisted updateexecution.statusto refuse joins duringexecuting/rolling-backif needed (so failures can reliably re-enable joins).
[correctness] socket.json.send API misuse
socket.json.send API misuse
PadMessageHandler uses `socket.json.send(...)` when refusing joins during drain; this is inconsistent with other disconnect paths in the same file that use `socket.emit('message', ...)`. With socket.io v4 this API is at best nonstandard and may throw at runtime, breaking the join-refusal logic during updates.The drain join-refusal path uses socket.json.send({disconnect: ...}), which is inconsistent with other disconnect paths and risks not working under socket.io v4.
Other disconnect paths in PadMessageHandler use socket.emit('message', {disconnect: ...}).
- src/node/handler/PadMessageHandler.ts[380-387]
Replace:
-
socket.json.send({disconnect: 'updateInProgress'});with: -
socket.emit('message', {disconnect: 'updateInProgress'});(orsocket.send({disconnect: ...})if the server expects default "message" event), then keepsocket.disconnect(true).
[reliability] Unhandled spawn error events
Unhandled spawn error events
UpdateExecutor.runStep() and RollbackHandler.runStep() do not listen for the child's 'error' event, only 'close'. If `spawn()` fails (e.g., executable missing/permission errors), the update flow can hang or crash, leaving state in-flight and potentially keeping session drain active.runStep() helpers in UpdateExecutor/RollbackHandler (and verifyReleaseTag) do not handle child.on('error', ...). Spawn failures can hang promises or crash the process.
These helpers are used in critical paths that mutate state and may have already disabled new connections.
- src/node/updater/UpdateExecutor.ts[44-67]
- src/node/updater/RollbackHandler.ts[23-43]
- src/node/updater/trustedKeys.ts[41-52]
- Add an
'error'listener to each spawned child. - For
UpdateExecutor.runStep, resolve{code: 1, stderr: err.message}(or reject and let the caller catch and transition state). - For
RollbackHandler.runStep, resolve a non-zero code on error. - For
verifyReleaseTag, treat spawn error as verification failure with proper logging. - Ensure any error path results in a persisted terminal/in-flight state that RollbackHandler can handle.
[reliability] Executor can leave busy state
Executor can leave busy state
executeUpdate() writes `execution.status='executing'` and then performs operations (e.g., lockfile backup) that can throw, but it does not catch exceptions to transition state to rolling-back/failed. The apply route catch block logs and (if possible) responds, but it does not repair execution state, so the system can remain stuck in a busy status that blocks further updates.If readSha(), saveState(), or copyFile() throws in executeUpdate(), the persisted state can remain executing and the system becomes stuck (future apply attempts get 409 busy).
executeUpdate() currently only transitions to rolling-back for non-zero exit codes from the spawned commands. Exceptions bypass that.
- src/node/updater/UpdateExecutor.ts[78-127]
- src/node/hooks/express/updateActions.ts[119-124]
- Wrap the body of
executeUpdate()in a top-level try/catch. - On any exception:
- Persist
execution: {status: 'rolling-back', reason: <error message>, ...}(or a dedicatedfailed-*status) so the route layer can callperformRollback(). - Append an explicit log line.
- Optionally harden the apply handler to detect a thrown executor error and set a safe terminal state (instead of leaving it busy).
[reliability] Execution state under-validated
Execution state under-validated
`loadState()` considers `execution` valid if it only contains a recognized `status`, without validating required fields (e.g., `fromSha`, `targetTag`) for that status. A malformed/hand-edited state file can therefore reach RollbackHandler with missing fields and trigger invalid rollback/verification behavior.The on-disk update state validator accepts any execution object that merely has a known status. This can let malformed state reach rollback/verification code paths that assume required fields exist.
loadState() is intended to “safely reset on malformed input”, and Tier 2 introduces a discriminated union (ExecutionStatus) with required fields per status.
- src/node/updater/state.ts[11-103]
- src/node/updater/types.ts[56-86]
- src/node/updater/RollbackHandler.ts[67-78]
- Strengthen
isValidExecution()to validate required fields based onstatus(type-guard each union member). - Example: if
status==='pending-verification'requiretargetTag,fromSha,deadlineAtare strings. - Alternatively (or additionally), in
loadState()coerce invalid/missing execution subfields back to{status:'idle'}to preserve boot safety. - Consider also validating
lastResult.outcomeagainst the allowed set, not justtypeof === 'string'.
[security] Public status exposes diagnostics
Public status exposes diagnostics
When `updates.requireAdminForStatus` is false (default), `/admin/update/status` is unauthenticated but now returns `execution` and `lastResult`, including SHAs and human-readable failure reasons derived from git/pnpm stderr. This leaks operational diagnostics and update outcomes to unauthenticated callers./admin/update/status is intended to be open by default, but it now includes execution and lastResult fields that can contain internal SHAs and command stderr-derived reasons.
The endpoint has an opt-in updates.requireAdminForStatus gate, but the new fields are returned regardless.
- src/node/hooks/express/updateStatus.ts[72-103]
- src/node/updater/UpdateExecutor.ts[102-120]
- If
updates.requireAdminForStatusis false, omit or redact sensitive subfields: - Drop
executionandlastResultentirely, or - Return only
execution.status(noreason,fromSha) and a booleanlastResultSuccess. - Alternatively, add a separate admin-only endpoint for detailed execution/log info and keep
/admin/update/statusminimal.
[correctness] Wrong short-drain countdown
Wrong short-drain countdown
For drain windows shorter than 30s/10s, SessionDrainer schedules the T-30/T-10 broadcasts at time 0 but still reports `seconds: 30` and `seconds: 10`, which is incorrect. This can mislead users during shorter configured drain periods.T-30 and T-10 announcements currently fire immediately for short drain windows but advertise 30/10 seconds remaining.
- src/node/updater/SessionDrainer.ts[50-56]
- Only schedule T-30 if
drainSeconds >= 30and T-10 ifdrainSeconds >= 10. - Or, if you want “fire ASAP” behavior, pass
secondsRemaining = Math.min(threshold, drainSeconds)so the displayed value is accurate.
| PR 7703 (2026-05-07) |
[correctness] Dropdowns clipped in popups
Dropdowns clipped in popups
Setting `.popup-content` to `overflow-y:auto` makes it a scroll container, which clips absolutely positioned descendants such as Nice Select’s option list. As a result, font/language dropdown menus in Settings can be truncated when opened near the bottom of the popup on short viewports, making options hard/impossible to click.With overflow-y: auto on .popup-content, popups become scroll containers that clip absolutely-positioned dropdown lists (Nice Select). This can truncate the font/language dropdown list in Settings and other popups on short viewports.
Nice Select renders the options list as position: absolute; top: 100% under the .nice-select element. When .popup-content is a scroll container, overflow outside the popup is clipped.
- Make Nice Select lists escape the scroll container when used inside popups (e.g., switch to
position: fixedand set coordinates like the toolbar path does, or move/portal the list tobody). - Alternatively, toggle a CSS class on the nearest
.popup-contentwhile a Nice Select is open to temporarily set overflow to visible, then restore it on close. - src/static/css/pad/popup.css[25-44]
- src/static/css/pad/form.css[106-139]
- src/static/js/vendors/nice-select.ts[96-129]
[reliability] Brittle scrollHeight test assertion
Brittle scrollHeight test assertion
The new Playwright test hard-requires `scrollHeight > clientHeight`, so it will fail if Settings content ever fits within 900×500 (due to CSS tweaks, localization, or feature changes) even though the popup remains correctly scrollable. The test already checks reachability of the bottom button via `scrollIntoViewIfNeeded()` + `toBeInViewport()`, which is the behavior that matters.The test requires scrollHeight > clientHeight, which is fragile because it depends on Settings content height always overflowing at 900×500.
The key behavior to validate is: the popup uses scrollable overflow styling and the bottom-most action (Delete pad) is reachable (not permanently cropped). The test already scrolls to #delete-pad and asserts it is in the viewport.
- Remove/relax the
scrollHeight > clientHeightassertion and rely on reachability checks. - Or, if overflow must be asserted, first assert
#delete-padis NOT in viewport before scrolling (to prove scrolling was needed) and adjust viewport height accordingly. - src/tests/frontend-new/specs/pad_settings.spec.ts[170-191]
| PR 7698 (2026-05-07) |
[security] `ep_*` passthrough lacks flag
`ep_*` passthrough lacks flag
The new plugin padOptions passthrough is enabled unconditionally on both client and server, changing behavior by default. This violates the requirement that new features be guarded by a feature flag and disabled by default.A new behavior (passing through plugin ep_* padOptions keys) is enabled unconditionally, but compliance requires new features to be behind a feature flag and disabled by default.
Both server-side Pad.normalizePadSettings() and client-side pad.applyPadSettings() currently preserve ep_* keys without checking any feature flag, so the behavior change is always active.
- src/node/db/Pad.ts[95-122]
- src/static/js/pad.ts[862-886]
[maintainability] New plugin API undocumented
New plugin API undocumented
This PR introduces a new plugin-facing capability module and behavior change, but does not update repository documentation accordingly. This can lead to plugin authors missing the new supported interface and its constraints.A new plugin-facing API/behavior was added (padOptionsPluginPassthrough capability and ep_* padOptions passthrough), but docs were not updated.
Plugins need a documented way to discover and correctly use the new behavior (including the exact key pattern /^ep_[a-z0-9_]+$/ and any limitations).
- src/node/utils/PluginCapabilities.ts[1-11]
- src/node/db/Pad.ts[43-53]
- src/static/js/pad.ts[877-882]
- doc/plugins.md[1-200]
[reliability] Unbounded plugin option payload
Unbounded plugin option payload
Pad.normalizePadSettings copies every /^ep_[a-z0-9_]+$/ key/value verbatim into padSettings, and padSettings is persisted and sent to every client as initialOptions. This introduces an unvalidated path for arbitrary plugin-provided objects (size/type), which can lead to oversized broadcasts and persistence/serialization failures if a plugin stores unsupported values.normalizePadSettings() now persists and broadcasts ep_* pad option values verbatim. Without validation, server-side plugins can accidentally store very large or non-serializable values that will be persisted and sent to all clients.
-
Pad.saveToDatabase()persists the whole Pad object (includingpadSettings). -
clientVars.initialOptionsandpadoptionsmessages distribute pad settings to all connected clients.
- src/node/db/Pad.ts[95-122]
- src/node/handler/PadMessageHandler.ts[319-349]
- src/node/handler/PadMessageHandler.ts[1165-1185]
- In
normalizePadSettings(), forep_*keys: - Enforce that values are JSON-safe (or at least plain JSON-compatible types: null/boolean/number/string/array/plain object).
- Enforce a maximum serialized size per key and/or total size across all
ep_*keys. - If a value is rejected, drop it (and optionally log at warn-level with the key name and padId when available).
[maintainability] PadOption type misses ep_*
PadOption type misses ep_*
Runtime behavior now accepts `ep_*` keys in pad options on the client, but the shared `PadOption` TypeScript type does not allow any `ep_*` properties. This makes the socket message types inaccurate and can cause TS code to reject/strip plugin padOptions or require unsafe casts.PadOption (SocketIO message type) does not model the newly supported ep_* keys, making TypeScript types diverge from runtime behavior.
Plugins and core TS code that construct/handle pad options messages may rely on PadOption typing.
- src/static/js/types/SocketIOMessage.ts[251-264]
- Add an index signature for plugin keys, ideally constrained with a template-literal key type, e.g.:
-
[k:ep_${string}]: unknown; - Alternatively, add a broader index signature if template literal types aren’t desired in this file.
- Consider also typing
viewmore narrowly if needed (optional).
| PR 7697 (2026-05-07) |
[reliability] English-only aria-label assertion
English-only aria-label assertion
`a11y_dialogs.spec.ts` asserts `aria-label` starts with the English string "Export ", which will fail once these new localization keys are translated (or if test locale changes to a translated language). This makes the Playwright suite brittle even though the underlying UI behavior is correct.The export-links a11y test asserts an English-only aria-label prefix ("Export "). Once non-English translations for pad.importExport.export<format>a.title are added, the aria-label will be localized and the test will fail.
html10n.localize([cookieLang, navigator.language, 'en']) can select a non-English language, and Playwright configuration does not pin the browser locale.
- src/tests/frontend-new/specs/a11y_dialogs.spec.ts[68-95]
- src/playwright.config.ts[45-69]
- src/static/js/l10n.ts[4-13]
Pick one:
-
Locale-agnostic assertions: replace the
startsWith('Export ')check withexpect(ariaLabel.length).toBeGreaterThan(0)(and keeptitle === ariaLabel+aria-hidden=true), matching the existing pattern used for#chaticon. -
Pin test locale: set Playwright context
locale: 'en-US'globally or for this test file (test.use({locale: 'en-US'})) so the English prefix assertion remains stable.
[security] Missing noopener on blank links
Missing noopener on blank links
The export links in `pad.html` open in a new tab via `target="_blank"` but still omit `rel="noopener"`, leaving `window.opener` exposed to the opened page. Because this PR touched these anchors, it’s a good opportunity to fix the reverse-tabnabbing hardening gap.Export links use target="_blank" without rel="noopener", allowing the opened page to access window.opener.
These anchors are modified in this PR, so adding rel is low-risk and consistent with other target="_blank" usage in the same template.
- src/templates/pad.html[326-343]
Add rel="noopener" (or rel="noopener noreferrer") to each export <a ... target="_blank" ...> in the export column.
| PR 7695 (2026-05-07) |
[reliability] Admin build depends on backend
Admin build depends on backend
admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.
The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.
- admin/package.json[6-13]
- Remove
pnpm gen:api &&frombuildandbuild-copyscripts (keepgen:apias an explicit script). - Rely on the existing CI freshness step to enforce that generated outputs are committed.
- (Optional) Keep
pnpm gen:apiin a dedicated script likebuild:regenif you still want a one-shot regen+build command.
[reliability] Admin build depends on backend
Admin build depends on backend
admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.
The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.
- admin/package.json[6-13]
- Remove
pnpm gen:api &&frombuildandbuild-copyscripts (keepgen:apias an explicit script). - Rely on the existing CI freshness step to enforce that generated outputs are committed.
- (Optional) Keep
pnpm gen:apiin a dedicated script likebuild:regenif you still want a one-shot regen+build command.
[maintainability] Generated schema committed
Generated schema committed
This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.
The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.
- admin/src/api/schema.d.ts[1-7]
- admin/src/api/version.ts[1-5]
- admin/package.json[7-20]
- .github/workflows/frontend-admin-tests.yml[71-80]
- admin/README.md[34-40]
[maintainability] `schema.d.ts` not 2-space
`schema.d.ts` not 2-space
The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.
This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.
- admin/scripts/gen-api.mjs[40-62]
- admin/src/api/schema.d.ts[9-27]
[reliability] Admin build depends on backend
Admin build depends on backend
admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.
The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.
- admin/package.json[6-13]
- Remove
pnpm gen:api &&frombuildandbuild-copyscripts (keepgen:apias an explicit script). - Rely on the existing CI freshness step to enforce that generated outputs are committed.
- (Optional) Keep
pnpm gen:apiin a dedicated script likebuild:regenif you still want a one-shot regen+build command.
[maintainability] Generated API files committed
Generated API files committed
This PR adds/updates generated artifacts (`admin/src/api/schema.d.ts` and `admin/src/api/version.ts`) and documents that they are checked in. Committing generated outputs violates the requirement to avoid committing build/runtime-generated artifacts.Generated OpenAPI outputs (admin/src/api/schema.d.ts and admin/src/api/version.ts) are committed and CI enforces that they remain committed and up to date.
The compliance checklist requires that generated build/runtime artifacts not be committed.
- admin/src/api/schema.d.ts[1-7]
- admin/src/api/version.ts[1-5]
- admin/README.md[34-53]
- .github/workflows/frontend-admin-tests.yml[71-80]
- admin/package.json[7-13]
[maintainability] Generated schema committed
Generated schema committed
This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.
The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.
- admin/src/api/schema.d.ts[1-7]
- admin/src/api/version.ts[1-5]
- admin/package.json[7-20]
- .github/workflows/frontend-admin-tests.yml[71-80]
- admin/README.md[34-40]
[maintainability] Generated schema committed
Generated schema committed
This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.
The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.
- admin/src/api/schema.d.ts[1-7]
- admin/src/api/version.ts[1-5]
- admin/package.json[7-20]
- .github/workflows/frontend-admin-tests.yml[71-80]
- admin/README.md[34-40]
[maintainability] Generated schema committed
Generated schema committed
This PR adds generated files (`admin/src/api/schema.d.ts`, `admin/src/api/version.ts`) and documents/enforces committing them via CI. This violates the policy against committing generated build/runtime artifacts.Generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are committed and CI enforces that workflow, which violates the repository compliance rule against committing generated artifacts.
The generator is already wired into the admin build (pnpm gen:api), so the repo can instead generate these files during build/CI without storing outputs in git.
- admin/src/api/schema.d.ts[1-7]
- admin/src/api/version.ts[1-5]
- admin/package.json[7-20]
- .github/workflows/frontend-admin-tests.yml[71-80]
- admin/README.md[34-40]
[maintainability] `schema.d.ts` not 2-space
`schema.d.ts` not 2-space
The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.
This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.
- admin/scripts/gen-api.mjs[40-62]
- admin/src/api/schema.d.ts[9-27]
[reliability] Admin build depends on backend
Admin build depends on backend
admin/package.json now runs gen:api during build/build-copy, and gen:api dynamically imports backend code (src/node/hooks/express/openapi.ts) that requires server-only dependencies. If the admin package is built in isolation (without the backend workspace deps installed), the build will fail with module resolution/runtime import errors.admin's build/build-copy scripts always run gen:api, but gen:api imports backend OpenAPI code that depends on backend-only node_modules. This makes isolated admin builds fragile.
The generated files (admin/src/api/schema.d.ts, admin/src/api/version.ts) are checked in and CI already has a freshness check.
- admin/package.json[6-13]
- Remove
pnpm gen:api &&frombuildandbuild-copyscripts (keepgen:apias an explicit script). - Rely on the existing CI freshness step to enforce that generated outputs are committed.
- (Optional) Keep
pnpm gen:apiin a dedicated script likebuild:regenif you still want a one-shot regen+build command.
[maintainability] `schema.d.ts` not 2-space
`schema.d.ts` not 2-space
The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.
This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.
- admin/scripts/gen-api.mjs[40-62]
- admin/src/api/schema.d.ts[9-27]
[maintainability] `schema.d.ts` not 2-space
`schema.d.ts` not 2-space
The newly added `admin/src/api/schema.d.ts` uses 4-space indentation (as generated), not the required 2-space indentation. This introduces formatting that violates the project’s indentation standard.The generated admin/src/api/schema.d.ts does not match the required 2-space indentation style.
This file is generated via admin/scripts/gen-api.mjs by calling openapi-typescript, which emits 4-space indentation by default.
- admin/scripts/gen-api.mjs[40-62]
- admin/src/api/schema.d.ts[9-27]
| PR 7692 (2026-05-07) |
[maintainability] Settings type mismatch
Settings type mismatch
`Settings.coerceValue()` can coerce env-var expansions (including `socialMeta.description`) into `number | boolean`, but both `SettingsType` and `SocialMetaSettings` still declare `description` as `string | null`, forcing unsafe casts and hiding the real contract from future callers. This increases the chance of future runtime errors if code assumes a string and calls string methods without guards.Settings.coerceValue() can produce number | boolean values for env-var-expanded settings, but the TypeScript types still claim socialMeta.description is string | null. This mismatch forces unsafe casts and makes it easy for future code to accidentally assume a string.
The PR mitigates the runtime issue by accepting unknown in resolveDescriptionWithOverride() and stringifying string|number|boolean. The type system should reflect this reality so callers and tests don’t need as unknown as string.
- src/node/utils/Settings.ts[160-170]
- src/node/utils/Settings.ts[862-897]
- src/node/utils/socialMeta.ts[98-107]
- src/node/utils/socialMeta.ts[164-182]
- src/tests/backend/specs/socialMeta-unit.ts[253-285]
- Update the settings type(s) to reflect runtime:
description: string | number | boolean | null(optionally alsoundefinedif applicable). - Update
SocialMetaSettingsinsocialMeta.tssimilarly. - Replace
override: unknownwithoverride: string | number | boolean | null | undefined. - Remove the
as unknown as stringcasts from the new unit tests (they should compile with the widened types).
| PR 7690 (2026-05-07) |
[maintainability] 4-space indentation in test
4-space indentation in test
New test file uses 4+ space continuation indentation (not 2 spaces), which violates the repository coding style requirement. This can cause formatting inconsistency and potential lint failures.The new Playwright spec uses 4+ space continuation indentation, but the codebase requires 2-space indentation.
The themeColor helper is split across lines with 4 spaces, and the multiline test(...) call uses 6 spaces before async.
- src/tests/frontend-new/specs/theme_color_dark_mode.spec.ts[4-5]
- src/tests/frontend-new/specs/theme_color_dark_mode.spec.ts[33-34]
[maintainability] Duplicated color mapping
Duplicated color mapping
The toolbar variant → hex color table is duplicated in both client code (skin_variants.ts) and server code (SkinColors.ts), so future updates can easily change one side and silently desync the server-rendered baseline theme-color from client-side updates. This can regress the exact “wrong address bar color” behavior this PR is fixing.The toolbar variant → theme color mapping is duplicated in both server and client code. This duplication can drift over time (CSS changes / palette changes), causing the server-rendered theme-color meta to disagree with the client’s dynamic updates.
- Server baseline meta is computed in
configuredToolbarColor(). - Client updates are computed in
updateThemeColorMeta(). - Both hard-code the same table.
- src/static/js/skin_variants.ts[7-33]
- src/node/utils/SkinColors.ts[3-34]
Create a small shared module (usable by both Node and browser builds) that exports the ordered mapping and default color, and import it from both places. If cross-bundle imports aren’t feasible, add a single-source generation step or a dedicated test that asserts the two tables are identical to prevent accidental drift.
| PR 7689 (2026-05-07) |
[maintainability] COREPACK_HOME path duplication
COREPACK_HOME path duplication
The Dockerfile sets `COREPACK_HOME` but later hard-codes `/opt/corepack` for `mkdir`/`chown`, so a future change to `COREPACK_HOME` can silently break cache sharing and permissions. This can reintroduce the original “pnpm falls back to latest” failure mode or cause permission errors.Issue description
COREPACK_HOME is set, but /opt/corepack is repeated literally in the same stage. If COREPACK_HOME is changed later, the directory creation/ownership logic will not follow, risking a permission mismatch and reintroducing the pnpm/corepack cache issue.
Issue Context
The PR’s intent is to pin the corepack cache directory and ensure it is readable/writable after switching to USER etherpad.
Fix Focus Areas
- Dockerfile[108-116]
Suggested change
Replace hard-coded /opt/corepack usages with ${COREPACK_HOME} (and quote it), e.g.:
mkdir -p /usr/share/man/man1 "${COREPACK_HOME}"-
chown -R etherpad:etherpad "${COREPACK_HOME}"This keeps the filesystem operations aligned with the configured cache path.
| PR 7688 (2026-05-07) |
[reliability] padOptions shim can crash
padOptions shim can crash
Settings.reloadSettings() unconditionally indexes and assigns settings.padOptions[key]; if settings.padOptions is null or a primitive (possible via settings.json), reloadSettings() will throw a TypeError during startup/reload.reloadSettings() assumes settings.padOptions is a non-null object when normalizing legacy userName/userColor values. If settings.json sets padOptions to null or another non-object, the shim can throw (TypeError) during startup or settings reload.
storeSettings() can overwrite settings.padOptions with any non-object value from settings.json, so a bad config can reach this new code path.
- src/node/utils/Settings.ts[1098-1105]
- src/node/utils/Settings.ts[811-833]
Add a type/shape guard before indexing/assigning:
- Only run the shim if
settings.padOptions != null && typeof settings.padOptions === 'object' && !Array.isArray(settings.padOptions). - Otherwise, skip the shim (and optionally log a clear warning/error about invalid
padOptionstype).
| PR 7678 (2026-05-05) |
[correctness] Duplicate leave removes active user
Duplicate leave removes active user
With the new logic, authenticated duplicates can stay connected with the same authorId, but `handleDisconnect` still broadcasts `USER_LEAVE` for every socket that disconnects. Because the client tracks presence keyed by `userId` (authorId) and deletes that entry on `USER_LEAVE`, disconnecting one of multiple same-author sockets makes the author appear offline even though another socket is still connected.Authenticated duplicate sessions are now allowed (same authorId, same pad), but the server still emits USER_LEAVE on any socket disconnect. This makes clients drop the author from presence even if another socket with the same authorId is still connected.
Client presence is keyed by userId (authorId). If multiple sockets share an authorId, the server must only emit USER_LEAVE when the last socket for that author leaves the pad.
- src/node/handler/PadMessageHandler.ts[227-257]
- src/node/handler/PadMessageHandler.ts[1023-1044]
- In
handleDisconnect, before broadcastingUSER_LEAVE, check_getRoomSockets(session.padId)for any remaining socket whosesessioninfos[roomSocket.id]?.author === session.author. - Only broadcast
USER_LEAVE(and runuserLeavehook if appropriate) when no remaining sockets exist for that author in that pad.
| PR 7674 (2026-05-05) |
[reliability] Corepack pnpm prepare failure
Corepack pnpm prepare failure
The Dockerfile now runs `corepack prepare pnpm@${PnpmVersion} --activate` without first updating corepack; this repo already documents that Node 22’s bundled corepack can reject newer pnpm versions due to a stale signing-key list, which would fail the Docker build at that step. This occurs in both the `adminbuild` and `build` stages, so a rejection would block image builds entirely.Issue description
The Dockerfile provisions pnpm via corepack prepare pnpm@${PnpmVersion} --activate but does not refresh corepack first. The repo’s snap build already documents that Node 22’s bundled corepack can reject newer pnpm versions due to stale signing keys; if that happens here, the image build fails at corepack prepare.
Issue Context
- This is done in two stages (
adminbuildandbuild). - The snap packaging flow addresses the exact failure mode by upgrading corepack before running
corepack prepare.
Fix Focus Areas
- Dockerfile[12-20]
- Dockerfile[98-109]
- snap/snapcraft.yaml[111-118]
Implementation direction
- In both Dockerfile stages, install/upgrade corepack (using the bundled npm before deleting npm), then run
corepack enable+corepack prepare pnpm@${PnpmVersion} --activate, then remove npm/npx as you do today. - Keep/extend the comment to explain why corepack is upgraded first (mirror the rationale in
snap/snapcraft.yaml).
[reliability] Forced uuid major upgrade
Forced uuid major upgrade
The new pnpm override forces `uuid` from `=14` across the workspace, changing the resolved `uuid` version used by `@azure/msal-node`. Because this bypasses downstream semver constraints, it can break any runtime path that loads `@azure/msal-node` if it is incompatible with `uuid@14`.The pnpm override forces a major uuid upgrade broadly (>=14.0.0), which can unintentionally pull newer uuid versions later and may violate transitive dependencies’ expectations.
Lockfile currently resolves uuid@14.0.0 for @azure/msal-node, but the override allows future automatic upgrades beyond 14.0.0 without review.
- package.json[54-80]
- pnpm-lock.yaml[18-26]
- Change the override value from
>=14.0.0to an exact pin (e.g.14.0.0) or a tightly-scoped range you’re comfortable supporting. - Add/extend CI coverage to exercise/import the relevant Azure/MSAL path if it is a supported runtime feature.
| PR 7667 (2026-05-04) |
[correctness] lastSeen lost or stale
lastSeen lost or stale
`lastSeen` is returned by `searchAuthors()` but it is not updated when an existing author is “seen” via token/mapper resolution (only `timestamp` is updated), and `anonymizeAuthor()` overwrites `globalAuthor` without preserving `lastSeen`, making the Admin UI’s “Last seen” column stale or blank (especially after erasure).The new lastSeen field is used by the admin author listing (searchAuthors()), but it is (a) not updated when an author is resolved via token/mapper mapping and (b) dropped when anonymizeAuthor() overwrites the globalAuthor record. This causes the /admin/authors “Last seen” column and sorting to be misleading.
-
mapAuthorWithDBKey()is a high-frequency “author seen” path and currently only updatestimestamp. -
anonymizeAuthor()overwrites the author object twice and currently omitslastSeen. -
searchAuthors()only readsrec.lastSeenand outputsnullif missing.
- src/node/db/AuthorManager.ts[117-138]
- src/node/db/AuthorManager.ts[377-384]
- src/node/db/AuthorManager.ts[412-421]
- src/node/db/AuthorManager.ts[511-518]
- When updating
timestampfor an existing author mapping, also updatelastSeen(likely to the sameDate.now()value). - When overwriting
globalAuthorinanonymizeAuthor(), includelastSeen(either preserveexisting.lastSeenor set it toDate.now()at erasure time). - (Optional hardening) In
searchAuthors(), consider falling back torec.timestampifrec.lastSeenis missing to avoid blank values for older records.
[security] `/authors` route not flagged
`/authors` route not flagged
The PR introduces a new admin authors UI and related read-only socket endpoints that are available regardless of any feature flag state. This conflicts with the requirement that new features be gated behind a feature flag and disabled by default.New admin functionality (the /admin/authors page and its read-only socket endpoints) is available even when gdprAuthorErasure.enabled is false, which violates the requirement that new features be behind a feature flag and disabled by default.
Currently, only the destructive anonymizeAuthor socket handler is gated; the new listing/preview capabilities are not.
- admin/src/main.tsx[26-26]
- admin/src/App.tsx[109-109]
- src/node/hooks/express/adminsettings.ts[310-347]
[reliability] Socket destructure can throw
Socket destructure can throw
The `/settings` socket handlers destructure `{authorID}` in the parameter list for `anonymizeAuthorPreview` and `anonymizeAuthor`; if the client emits the event with `undefined`/`null` payload, it throws before the try/catch and can result in an unhandled rejection and a broken admin-socket flow.anonymizeAuthorPreview / anonymizeAuthor socket handlers destructure authorID in the argument list. If a client emits the event without a payload (or with null), destructuring throws before entering the handler body, bypassing the handler’s try/catch.
Even if the shipped UI always sends {authorID: ...}, defensive server code should treat socket payloads as untrusted and avoid pre-body destructuring.
- src/node/hooks/express/adminsettings.ts[330-368]
- Change handler signatures to accept
payload: any(optionally defaulting to{}), then readconst authorID = payload?.authorID;inside the try block. - Keep existing validation that emits
{error: 'authorID is required'}when missing. Example pattern:
[correctness] Preview error not handled
Preview error not handled
The authors UI treats every `results:anonymizeAuthorPreview` as a successful preview; if the server responds with `{error}` (as the backend does on exceptions), the modal still renders counter placeholders from missing fields and Continue remains enabled.AnonymizePreview includes an optional error field, and the backend emits {authorID, error: ...} on failures. The UI currently moves to the preview phase regardless and renders counters that may be undefined, and it still allows clicking Continue.
This is a robustness issue for exceptional cases (DB read errors, etc.) that results in broken modal text and potentially misleading flows.
- admin/src/pages/AuthorPage.tsx[58-75]
- admin/src/pages/AuthorPage.tsx[151-179]
- admin/src/utils/AuthorSearch.ts[28-36]
- In
onPreview, branch ondata.error: - either keep the dialog in a dedicated error phase that renders the message and disables Continue, or
- close the dialog and show a toast/error banner.
- Ensure the modal’s counter rendering is only executed when all expected numeric fields are present.
| PR 7665 (2026-05-03) |
[reliability] uncaughtException doesn’t exit
uncaughtException doesn’t exit
tests/backend/diagnostics.ts registers an uncaughtException handler that only logs and returns, which can prevent backend-tests from exiting non-zero on fatal errors when tests/backend/common.ts is not imported. This can turn a crash into a hang/continued execution, and it also breaks the intended “convert unhandledRejection to uncaught exception” fail-fast behavior for specs that don’t load common.ts.src/tests/backend/diagnostics.ts installs an uncaughtException handler that only logs. If tests/backend/common.ts is not imported in a spec run, this handler can prevent a fatal error from forcing a non-zero exit (common.ts explicitly calls process.exit(1) to preserve default behavior).
-
src/tests/backend/common.tshas anuncaughtExceptionhandler that logs and thenprocess.exit(1)specifically to preserve default behavior when a handler exists. -
src/package.jsonnow requires./tests/backend/diagnostics.tsglobally, but does not requirecommon.ts. - Some specs don’t import
common.ts, sodiagnostics.tscan be the only handler.
- src/tests/backend/diagnostics.ts[55-61]
After logging, ensure the process still fails fast when no other handler will do it. Options:
- Mimic
common.ts: callprocess.exit(1)after logging. - If you want to defer to other handlers when present: only force-exit if this is the only
uncaughtExceptionlistener (e.g.,if (process.listeners('uncaughtException').length === 1) process.exit(1);), otherwise return. - Alternatively, set
process.exitCode = 1and schedule asetImmediate(() => process.exit(1))so later-registered handlers still get a chance to run/log first.
| PR 7660 (2026-05-02) |
[correctness] Click steals rename focus
Click steals rename focus
Clicking the editable name for unnamed users will also trigger the new row click handler, which then focuses #chatinput and interrupts the rename workflow. This makes it difficult or impossible to name unnamed users from the user list.Issue description
The new delegated click handler on #otheruserstable tr[data-authorId] triggers even when the click target is the existing rename <input> in the name cell, and it later focuses #chatinput. This breaks the unnamed-user rename interaction.
Issue Context
Unnamed users are rendered with an <input> in the .usertdname cell and are wired up via #otheruserstable input.newinput.
Fix Focus Areas
- src/static/js/pad_userlist.ts[373-410]
- src/static/js/pad_userlist.ts[183-196]
Suggested change
Add early-return guards before doing any prefill/show work, for example:
- Return if
$(event.target).closest('input, textarea, select, button, a, [contenteditable=true]').length. - Or at minimum return if
$(event.target).closest('.usertdname input').length. This keeps the row-click behavior while preserving rename semantics.
| PR 7647 (2026-05-01) |
[observability] Hardcoded 5s socket wait
Hardcoded 5s socket wait
waitForSocketEvent now hardcodes a 5s timeout for all socket events, which can cause suites that rely on Mocha’s default per-test timeout to fail with a generic Mocha timeout instead of the helper’s descriptive error. It also makes failing runs wait longer before surfacing the root cause.waitForSocketEvent() uses a hardcoded 5000ms timeout for all socket waits. In suites that don’t increase Mocha’s default per-test timeout, failures may surface as a generic Mocha timeout instead of waitForSocketEvent’s explicit timed out waiting for <event> error.
Some callers (e.g., connect/handshake paths) legitimately need a longer timeout on slow CI runners, but other call sites benefit from failing fast and producing a clear error.
- Add an optional
timeoutMsparameter towaitForSocketEvent(socket, event, timeoutMs?), and use it insetTimeout(..., timeoutMs). - Update slow paths (
connect(),handshake(), and any other known-slow call sites) to pass5000explicitly. - Keep a shorter default (or ensure suites that rely on defaults set
this.timeout(...)high enough) to avoid Mocha masking the helper’s error. - src/tests/backend/common.ts[114-219]
- src/tests/backend/specs/messages.ts[12-60]
[reliability] SessionStore waits still tight
SessionStore waits still tight
SessionStore expiry tests still rely on fixed sleeps with only ~30ms headroom (e.g., expires in 300ms, then sleep 330ms), so timer jitter or event-loop delays can still cause intermittent failures under heavy CI load. This PR improves the margin vs. before, but it doesn’t eliminate the underlying flake pattern.Several SessionStore tests use fixed setTimeout sleeps and then assert the DB record is gone/present. Even with the increased windows, the assertions can still race the actual cleanup work if the event loop is delayed.
SessionStore schedules expiration cleanup with setTimeout(...) and documents races on slow systems. Tests that assume cleanup has run at an exact time remain inherently timing-fragile.
- Replace fixed sleeps like
await new Promise(r => setTimeout(r, 330))+ assert with a small polling helper (e.g., poll every 25ms up to a 2–5s max) that waits until the DB condition is met. - Keep the expiry durations modest, but remove tight coupling between “sleep duration” and “expiry duration”.
- src/tests/backend/specs/SessionStore.ts[47-168]
| PR 7645 (2026-05-01) |
[reliability] Concurrency blocks PR builds
Concurrency blocks PR builds
With the new pull_request trigger, PR docs builds share the same concurrency group ("pages") as real deployments, so only one run can execute at a time. This can delay PR feedback or delay production docs deployments whenever both are active.Docs builds on PRs now contend with production docs deployments because the workflow uses a single concurrency.group: "pages" for all events. This serializes PR builds and push deployments.
The PR adds a pull_request trigger but keeps the existing global concurrency group. Concurrency is useful for deployments but typically undesirable for PR-only builds.
- .github/workflows/build-and-deploy-docs.yml[4-17]
- .github/workflows/build-and-deploy-docs.yml[28-33]
Either:
- Use distinct concurrency groups per event (e.g.,
pages-${{ github.event_name }}), or - Split into separate jobs: a PR
buildjob without the Pages concurrency lock and a push-onlydeployjob with the lock.
[maintainability] Undocumented `engines.node` bump
Undocumented `engines.node` bump
The PR raises the minimum supported Node version via `engines.node`, but related documentation still states the old requirement. This can mislead contributors/users and makes a breaking compatibility change without updating the documented guidance.package.json now sets engines.node to >=22.12.0, but documentation still claims the project requires >=22.0.0.
This is a user-visible compatibility/requirements change and should be reflected anywhere the Node requirement is documented.
- doc/npm-trusted-publishing.md[88-91]
- package.json[45-45]
| PR 7644 (2026-05-01) |
[security] Unvalidated plugin names
Unvalidated plugin names
`update()` installs every name from `var/installed_plugins.json` (except `ep_etherpad-lite`) without enforcing the `ep_` prefix. If that file is corrupted or modified, running `plugins update` can install arbitrary packages, unlike `checkForMigration()` which explicitly restricts to `ep_` plugins.bin/plugins.ts updates plugins by trusting var/installed_plugins.json and installing every entry by name. This should be restricted to actual Etherpad plugins (ep_ prefix) to prevent accidental or malicious installation of arbitrary packages.
checkForMigration() already enforces plugin.name.startsWith(plugins.prefix) before installing from installed_plugins.json, but update() does not.
- Add an explicit
ep_/plugins.prefixvalidation filter before invokinginstallPlugin(). - Consider de-duplicating names (e.g., via
new Set(names)) to avoid repeated installs if the file contains duplicates.
- bin/plugins.ts[81-112]
- src/static/js/pluginfw/installer.ts[81-134]
- src/static/js/pluginfw/plugins.ts[36-154]
- Docs
- Translating
- HTTP API
- Plugin framework (API hooks)
- Plugins (available)
- Plugins (list)
- Plugins (wishlist)
- Etherpad URIs / URLs to specific resources IE export
- Etherpad Full data export
- Introduction to the source
- Release Procedure
- Etherpad Developer guidelines
- Project to-do list
- Changeset Library documentation
- Alternative Etherpad-Clients
- Contribution guidelines
- Installing Etherpad
- Deploying Etherpad as a service
- Deploying Etherpad on CloudFoundry
- Deploying Etherpad on Heroku
- Running Etherpad on Phusion Passenger
- Putting Etherpad behind a reverse Proxy (HTTPS/SSL)
- How to setup Etherpad on Ubuntu 12.04 using Ansible
- Migrating from old Etherpad to Etherpad
- Using Etherpad with MySQL
- Customizing the Etherpad web interface
- Enable import/export functionality with AbiWord
- Getting a list of all pads
- Providing encrypted web access to Etherpad using SSL certificates
- Optimizing Etherpad performance including faster page loads
- Getting to know the tools and scripts in the Etherpad /bin/ folder
- Embedding a pad using the jQuery plugin
- Using Embed Parameters
- Integrating Etherpad in a third party app (Drupal, MediaWiki, WordPress, Atlassian, PmWiki)
- HTTP API client libraries