Skip to content

Commit cbe551b

Browse files
JohnMcLearclaude
andauthored
feat(updater): tier 3 — auto update with grace window (#7607) (#7720)
* feat(updater): scheduled execution state + graceStartTag dedupe field (#7607) Preparation for Tier 3 of the auto-update subsystem: - ExecutionStatus gains `scheduled` (targetTag, scheduledFor, startedAt). - EmailSendLog gains `graceStartTag` for one-shot grace-start email dedupe. - state validator accepts the new shape, requires per-status fields, and backfills graceStartTag=null on a Tier 1/2 state file. Plus the implementation plan at docs/superpowers/plans/2026-05-11-auto-update-pr3-tier3-auto.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): decideSchedule pure decision function (#7607) Adds src/node/updater/Scheduler.ts with the Tier 3 pure decision logic: - schedules when canAuto + idle/verified/terminal-cleared - reschedules when a newer tag appears mid-grace - emits a grace-start email (once per tag) when adminEmail is set - cancels a stale schedule when policy flips canAuto off - no-ops during in-flight / terminal states - clamps preApplyGraceMinutes to [0, 7 days] Also extends Notifier's EmailKind union with 'grace-start' so the decision result types correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): scheduler timer runner with arm/cancel (#7607) Adds createSchedulerRunner to Scheduler.ts: - arm(): clears any prior timer, sets a fresh one for scheduledFor - cancel(): clears the pending timer, idempotent - past scheduledFor → fires with delay=0 (rehydrate after restart-in-grace) - single-fire-per-arm semantics; armedFor cleared on fire Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(updater): extract apply pipeline shared by HTTP + scheduler (#7607) Lifts the preflight → drain → execute orchestration out of the /admin/update/apply HTTP handler into src/node/updater/applyPipeline.ts. The HTTP handler keeps its 4xx status mapping; the pipeline owns the state transitions, lock release, drain coordination, and rollback hand- off. The new ApplyPipelineDeps interface accepts an onAccepted callback so the HTTP path can still 202 mid-flow while the Tier 3 scheduler path (next commit) can no-op. Adds `scheduled` to the apply allowed-entry list so an admin can "Apply now" during the Tier 3 grace window. 13 vitest cases cover happy / preflight-failed / cancelled / busy / lock-held / scheduled-entry / rollback / lock-release. Existing 12 mocha integration tests still pass without change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): wire Tier 3 scheduler into boot + performCheck (#7607) - expressCreateServer instantiates the scheduler runner and rehydrates the timer when a prior boot left state.execution = scheduled - performCheck evaluates decideSchedule after the notifier pass: schedule transitions state + sends grace-start email + arms timer; cancel-schedule resets to idle + cancels timer - shutdown cancels the timer - exposes cancelScheduler() so the cancel endpoint (next commit) can drop the pending schedule - buildSchedulerApplyDeps() supplies the full production-wired pipeline deps (preflight, executor, rollback) for the scheduler-triggered apply Adds tests/backend/specs/updater-scheduler-integration.ts covering boot-rehydrate fire-on-past and the decision-to-state round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): cancel handler supports Tier 3 scheduled state (#7607) POST /admin/update/cancel now accepts execution.status === 'scheduled' in addition to preflight/draining. The handler calls cancelScheduler() to drop the pending in-process timer, then transitions state to idle with lastResult.outcome = 'cancelled' (mirroring the existing pattern). Adds a Tier 3 integration test that seeds a scheduled state, calls /admin/update/cancel, and asserts the state machine landed correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(admin): countdown + cancel UI for Tier 3 scheduled updates (#7607) - store.ts: extend Execution union with the scheduled variant - UpdatePage.tsx: render countdown panel during scheduled; Apply button is relabelled "Apply now" so the admin can skip the remaining grace; Cancel button accepts scheduled state - UpdateBanner.tsx: dedicated scheduled banner with live remaining time - en.json: new i18n keys (execution.scheduled, banner.scheduled, page.scheduled.{title,countdown,apply_now}) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(updater): playwright spec for Tier 3 scheduled UI (#7607) Three cases against a mocked /admin/update/status: - countdown panel + Apply now + Cancel render when execution is scheduled - Cancel button posts /admin/update/cancel and triggers re-fetch - /admin (banner) shows "Auto-update to <tag> scheduled" copy Mirrors the existing update-page-actions.spec.ts mock pattern (page.route). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(updater): document Tier 3 auto with grace window (#7607) - doc/admin/updates.md: flip Tier 3 from "designed, not yet implemented" to current; expand preApplyGraceMinutes table row; add a Tier 3 section explaining schedule / cancel / Apply now / restart-in-grace and the grace-start email - settings.json.template: clarify the preApplyGraceMinutes comment - CHANGELOG.md: Unreleased entry for Tier 3 - runbook §11: full Tier 3 smoke (happy, cancel, apply-now, restart-in- grace, email) plus the additional sign-off checkboxes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(admin): UpdatePage handles missing execution field; scope spec locator (#7607) Two CI fixes for PR #7720: 1. UpdatePage.tsx — optional-chain us.execution.status. Integration test stubs (update-banner.spec.ts) ship payloads without the Tier 2/3 execution / lastResult / lockHeld fields; without optional chaining on the new scheduled-derivation line the whole page crashed before the h1 rendered, breaking the unrelated "renders current version" test. 2. update-scheduled.spec.ts — scope the v2.7.2 assertion to the .update-scheduled section. The regex was matching three elements (banner, countdown panel, changelog link) and tripped Playwright's strict-mode locator check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(updater): address Qodo review (Tier 3 race conditions + tier-off bypass) (#7607) Four fixes for bugs flagged by Qodo's review of PR #7720: 1. **Tier=off bypasses scheduler** (correctness). expressCreateServer used to instantiate the scheduler and rehydrate any persisted `scheduled` state regardless of `updates.tier`. A user who set `tier: "off"` after a schedule had been persisted would still see the timer fire after restart. The boot path now skips scheduler creation when tier is off and explicitly clears a stale scheduled state to idle (logged so the admin sees what happened). 2. **Timer fire skips state recheck** (reliability). The scheduler's timer callback called applyUpdate() directly. Race: admin clicks Cancel at the same instant the timer fires, or the tier flips during the grace window. Now schedulerTriggerApply re-loads state and re-evaluates policy via a new pure decideTriggerApply() helper in Scheduler.ts. If state is no longer scheduled (or scheduled for a different tag), aborts. If policy now denies auto, persists state back to idle and aborts. 3. **Apply-now leaves scheduler timer armed** (correctness). The apply endpoint accepts `scheduled` as an entry status but didn't cancel the in-process scheduler timer. After the admin clicks Apply now, the still-armed timer could later fire and attempt another apply (especially if the manual one finishes in preflight-failed, which is also an allowed-entry status). Apply handler now calls cancelScheduler() when entering from `scheduled`. 4. **scheduledFor not validated as timestamp** (reliability). State validator only required scheduledFor / startedAt etc. to be non-empty strings; a hand-edited "scheduledFor": "garbage" would pass validation and yield NaN delay → immediate fire. The validator now requires known timestamp fields to be parseable via Date.parse(). Tests: 6 new decideTriggerApply cases + 3 new state.ts validation cases. 189 vitest pass / 29 mocha integration pass / ts-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 197f007 commit cbe551b

22 files changed

Lines changed: 3469 additions & 151 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,10 @@
88
- Terminal `rollback-failed` state surfaces a strong banner; the admin clicks Acknowledge once they've manually recovered to clear the lock and re-allow Tier 2 attempts.
99
- New settings under `updates.*`: `preApplyGraceMinutes`, `drainSeconds`, `rollbackHealthCheckSeconds`, `diskSpaceMinMB`, `requireSignature`, `trustedKeysPath`. Tag signature verification is opt-in (default `false`) — see `doc/admin/updates.md` for the keyring setup.
1010
- **A process supervisor (systemd / pm2 / docker `--restart=unless-stopped`) is required to apply updates.** Without one, exit 75 leaves the instance down.
11-
- Tiers 3 (auto with grace window) and 4 (autonomous in maintenance window) remain designed but unimplemented and will land in subsequent releases.
11+
- **Self-update subsystem — Tier 3 (auto with grace window).**
12+
- On a git install, set `updates.tier: "auto"` to have new releases applied automatically after `preApplyGraceMinutes`. During the grace window, `/admin/update` shows a live countdown plus Cancel and Apply now buttons. Schedules are persisted to `var/update-state.json`, so an Etherpad restart during the grace window rehydrates the timer instead of losing the schedule. A new release tag detected mid-grace re-arms the timer; if `adminEmail` is set, a one-shot `grace-start` notification fires per scheduled tag (issue #7607).
13+
- The terminal `rollback-failed` state continues to disable auto/autonomous attempts globally until acknowledged; manual click stays available because an admin click *is* the intervention the terminal state requires.
14+
- Tier 4 (autonomous in a maintenance window) remains designed but unimplemented and will land in a subsequent release.
1215

1316
# 2.7.3
1417

admin/src/components/UpdateBanner.tsx

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
1-
import {useEffect} from 'react';
1+
import {useEffect, useState} from 'react';
22
import {Link} from 'react-router-dom';
33
import {Trans, useTranslation} from 'react-i18next';
44
import {useStore} from '../store/store';
55

6+
const fmtRemaining = (ms: number): string => {
7+
if (ms <= 0) return '0s';
8+
const s = Math.floor(ms / 1000);
9+
const m = Math.floor(s / 60);
10+
const sec = s % 60;
11+
return m > 0 ? `${m}m ${sec}s` : `${sec}s`;
12+
};
13+
614
export const UpdateBanner = () => {
715
const {t} = useTranslation();
816
const updateStatus = useStore((s) => s.updateStatus);
@@ -17,6 +25,19 @@ export const UpdateBanner = () => {
1725
return () => { cancelled = true; };
1826
}, [setUpdateStatus]);
1927

28+
const scheduledFor = updateStatus?.execution?.status === 'scheduled'
29+
? (updateStatus.execution as {scheduledFor: string}).scheduledFor
30+
: null;
31+
const [remainingMs, setRemainingMs] = useState<number>(() =>
32+
scheduledFor ? Math.max(0, new Date(scheduledFor).getTime() - Date.now()) : 0);
33+
useEffect(() => {
34+
if (!scheduledFor) return;
35+
const target = new Date(scheduledFor).getTime();
36+
setRemainingMs(Math.max(0, target - Date.now()));
37+
const id = setInterval(() => setRemainingMs(Math.max(0, target - Date.now())), 1000);
38+
return () => clearInterval(id);
39+
}, [scheduledFor]);
40+
2041
if (!updateStatus) return null;
2142

2243
// Terminal rollback-failed wins over the regular "update available" banner —
@@ -31,6 +52,23 @@ export const UpdateBanner = () => {
3152
);
3253
}
3354

55+
// Tier 3: scheduled update — show countdown banner instead of the plain
56+
// "update available" one.
57+
if (updateStatus.execution?.status === 'scheduled') {
58+
const exec = updateStatus.execution as {targetTag: string; scheduledFor: string};
59+
return (
60+
<div className="update-banner update-banner-scheduled" role="status">
61+
<strong>
62+
<Trans
63+
i18nKey="update.banner.scheduled"
64+
values={{tag: exec.targetTag, remaining: fmtRemaining(remainingMs)}}
65+
/>
66+
</strong>{' '}
67+
<Link to="/update">{t('update.banner.cta')}</Link>
68+
</div>
69+
);
70+
}
71+
3472
if (!updateStatus.latest) return null;
3573
if (updateStatus.currentVersion === updateStatus.latest.version) return null;
3674

admin/src/pages/UpdatePage.tsx

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,14 @@ type FetchState =
1111

1212
const IN_FLIGHT_STATUSES = ['preflight', 'draining', 'executing', 'rolling-back'];
1313

14+
const fmtRemaining = (ms: number): string => {
15+
if (ms <= 0) return '0s';
16+
const s = Math.floor(ms / 1000);
17+
const m = Math.floor(s / 60);
18+
const sec = s % 60;
19+
return m > 0 ? `${m}m ${sec}s` : `${sec}s`;
20+
};
21+
1422
export const UpdatePage = () => {
1523
const {t} = useTranslation();
1624
const us = useStore((s) => s.updateStatus);
@@ -79,6 +87,21 @@ export const UpdatePage = () => {
7987
}
8088
};
8189

90+
// Tier 3 countdown — derive scheduledFor outside the conditional returns so
91+
// the hook order is stable on every render.
92+
const scheduledFor = us?.execution?.status === 'scheduled'
93+
? (us.execution as {scheduledFor: string}).scheduledFor
94+
: null;
95+
const [remainingMs, setRemainingMs] = useState<number>(() =>
96+
scheduledFor ? Math.max(0, new Date(scheduledFor).getTime() - Date.now()) : 0);
97+
useEffect(() => {
98+
if (!scheduledFor) return;
99+
const target = new Date(scheduledFor).getTime();
100+
setRemainingMs(Math.max(0, target - Date.now()));
101+
const id = setInterval(() => setRemainingMs(Math.max(0, target - Date.now())), 1000);
102+
return () => clearInterval(id);
103+
}, [scheduledFor]);
104+
82105
if (fetchState.kind === 'loading') {
83106
return <div>{t('admin.loading', {defaultValue: 'Loading...'})}</div>;
84107
}
@@ -110,12 +133,20 @@ export const UpdatePage = () => {
110133

111134
const upToDate = !us.latest || us.currentVersion === us.latest.version;
112135
const showApply = !!us.policy?.canManual
113-
&& (status === 'idle' || status === 'verified')
136+
&& (status === 'idle' || status === 'verified' || status === 'scheduled')
114137
&& !us.lockHeld
115138
&& !upToDate;
116-
const showCancel = status === 'preflight' || status === 'draining';
139+
const showCancel = status === 'preflight' || status === 'draining' || status === 'scheduled';
117140
const showAcknowledge = status === 'preflight-failed' || status === 'rolled-back' || status === 'rollback-failed';
118141

142+
// Optional-chain the execution lookup: some integration-test stubs of
143+
// /admin/update/status omit Tier 2/3 fields entirely (see
144+
// update-banner.spec.ts), and accessing `.status` on an undefined
145+
// execution would crash the whole page before the h1 renders.
146+
const scheduled = us.execution?.status === 'scheduled'
147+
? us.execution as {targetTag: string; scheduledFor: string}
148+
: null;
149+
119150
return (
120151
<div className="update-page">
121152
<h1><Trans i18nKey="update.page.title"/></h1>
@@ -152,10 +183,24 @@ export const UpdatePage = () => {
152183
</p>
153184
)}
154185

186+
{scheduled && (
187+
<section className="update-scheduled" aria-live="polite">
188+
<h2><Trans i18nKey="update.page.scheduled.title"/></h2>
189+
<p>
190+
<Trans
191+
i18nKey="update.page.scheduled.countdown"
192+
values={{tag: scheduled.targetTag, remaining: fmtRemaining(remainingMs)}}
193+
/>
194+
</p>
195+
</section>
196+
)}
197+
155198
<div className="update-actions">
156199
{showApply && (
157200
<button onClick={() => post('/admin/update/apply')} disabled={actionInFlight}>
158-
{t('update.page.apply')}
201+
{status === 'scheduled'
202+
? t('update.page.scheduled.apply_now')
203+
: t('update.page.apply')}
159204
</button>
160205
)}
161206
{showCancel && (

admin/src/store/store.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import {InstalledPlugin} from "../pages/Plugin.ts";
55

66
export type Execution =
77
| {status: 'idle'}
8+
| {status: 'scheduled'; targetTag: string; scheduledFor: string; startedAt: string}
89
| {status: 'preflight'; targetTag: string; startedAt: string}
910
| {status: 'preflight-failed'; targetTag: string; reason: string; at: string}
1011
| {status: 'draining'; targetTag: string; drainEndsAt: string; startedAt: string}

doc/admin/updates.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Etherpad ships with a built-in update subsystem.
44

55
- **Tier 1 (notify)** — default. A banner appears in the admin UI when a new release is available, and pad users see a discreet badge if the running version is severely outdated or flagged as vulnerable. No execution.
66
- **Tier 2 (manual click)** — admins on a git install can click "Apply update" at `/admin/update`. Etherpad drains active sessions, runs `git fetch / checkout / pnpm install / pnpm run build:ui`, and exits with code 75 so a process supervisor restarts it on the new version. Auto-rolls back on failure.
7-
- **Tier 3 (auto with grace window)**designed, not yet implemented.
7+
- **Tier 3 (auto with grace window)**opt-in. On a git install, a newly detected release transitions execution state to `scheduled` and is applied after `preApplyGraceMinutes`. During the grace window, `/admin/update` shows a live countdown plus Cancel and Apply now buttons; an admin email (if `adminEmail` is set) fires once per scheduled tag.
88
- **Tier 4 (autonomous in maintenance window)** — designed, not yet implemented.
99

1010
## Settings
@@ -42,7 +42,7 @@ In `settings.json`:
4242
| `updates.checkIntervalHours` | `6` | How often to poll GitHub Releases. |
4343
| `updates.githubRepo` | `"ether/etherpad"` | Override for forks. |
4444
| `updates.requireAdminForStatus` | `false` | Lock the `/admin/update/status` endpoint to authenticated admin sessions. Default `false` matches existing Etherpad behavior — `/health` already exposes `releaseId` publicly, and changelog data comes from a public GitHub release. Set `true` to hide the full update payload from non-admins without disabling the updater (`tier: "off"` is the heavier opt-out that removes the endpoints entirely). |
45-
| `updates.preApplyGraceMinutes` | `0` | **Tier 3 only.** Wait this many minutes between detecting a new release and starting the drain so the admin can cancel. Has no effect at tier `"manual"`. |
45+
| `updates.preApplyGraceMinutes` | `0` | **Tier 3 only.** Wait this many minutes between detecting a new release and starting the drain so the admin can cancel via `/admin/update`. `0` applies immediately when allowed. Clamped to `[0, 7*24*60]` (one week). Has no effect at tier `"manual"`. |
4646
| `updates.drainSeconds` | `60` | How long to broadcast "restart imminent" announcements to active pads before exiting. T-60 / T-30 / T-10 broadcasts fire automatically at the matching offsets within this window. |
4747
| `updates.rollbackHealthCheckSeconds` | `60` | After a fresh boot post-update, give `/health` this long to come up. If it doesn't, RollbackHandler restores the previous SHA. |
4848
| `updates.diskSpaceMinMB` | `500` | Pre-flight refuses to start an update unless the install volume has at least this many MB free. |
@@ -156,6 +156,36 @@ The check shells out to `git verify-tag <tag>`. The keyring at `trustedKeysPath`
156156

157157
Tier 2 deliberately refuses to apply on `installMethod: "docker"` because in-container `git fetch / pnpm install / build:ui` doesn't survive a container restart — the orchestrator brings the container back up on the same image tag and the work is lost. Docker installs stay on Tier 1 (banner + version status) for now.
158158

159+
## Tier 3 — auto with grace window
160+
161+
Tier 3 builds on Tier 2 by scheduling the apply automatically when a new release is detected. The same `git fetch / checkout / pnpm install / build:ui / exit 75` pipeline runs — only the trigger changes.
162+
163+
To enable, on a git install: set `updates.tier: "auto"` and (optionally) `updates.preApplyGraceMinutes` to the grace duration you want.
164+
165+
### What happens when a new release lands
166+
167+
1. The periodic version checker (`updates.checkIntervalHours`) hits GitHub Releases.
168+
2. If `policy.canAuto` is true (install is git, no terminal `rollback-failed` state, tier is `"auto"` or `"autonomous"`), the scheduler transitions `execution.status` to `scheduled` with `scheduledFor = now + preApplyGraceMinutes`.
169+
3. The schedule is persisted to `var/update-state.json`, so an Etherpad restart inside the grace window rehydrates the timer rather than losing the schedule.
170+
4. `/admin/update` shows a live countdown panel plus two buttons:
171+
- **Cancel**`POST /admin/update/cancel` returns the state to `idle` and drops the in-process timer.
172+
- **Apply now**`POST /admin/update/apply` skips the remaining grace; the regular Tier 2 pipeline runs immediately.
173+
5. When the timer fires, the scheduler runs the exact same pipeline as a manual Tier 2 click: pre-flight → drain → execute → exit 75.
174+
175+
### Re-scheduling and stale state
176+
177+
- If a newer release tag appears while a schedule is pending, the scheduler re-arms the timer for the new tag. The `email.graceStartTag` dedupe field guards against duplicate `grace-start` notifications.
178+
- If `updates.tier` is flipped back to `"manual"` or `"notify"` while a schedule is pending, the next periodic check cancels the schedule (state back to `idle`).
179+
- `rollback-failed` disables Tier 3 globally. The admin must `POST /admin/update/acknowledge` (or visit `/admin/update` and click Acknowledge) before any further auto-schedules are armed. Tier 2 manual click stays available because the admin click *is* the intervention the terminal state requires.
180+
181+
### Email (`adminEmail` set)
182+
183+
A single `grace-start` notification fires per scheduled tag:
184+
185+
> [Etherpad] Auto-update scheduled for 2.7.2
186+
187+
with the `scheduledFor` timestamp. Etherpad core does not yet wire SMTP; the message logs as `(would send email)` until a future PR adds a transport. Cadence and dedupe still update correctly.
188+
159189
The right way to give docker admins an in-product Apply button is to delegate to the orchestrator rather than mutate the container. Two patterns to consider in a follow-up PR:
160190

161191
- **Instructions-only.** When the page detects `installMethod: docker` *and* a newer release exists, swap the policy-denial copy for actionable instructions (`docker pull etherpad/etherpad:<tag>` for plain docker; `docker compose pull && docker compose up -d` for compose). Cheap, no new attack surface.

0 commit comments

Comments
 (0)