- Status: Stable
- Canonical type: the
RunEventdiscriminated-union Zod schema lives in@relavium/shared(run-event.ts) and is consumed / re-exported by@relavium/core. This document is the authoritative contract; the schema is its runtime-validated implementation — if the two ever diverge, the doc wins and the schema is corrected to it (see the note under Selected definitions) - Transport: Phase 1 — in-process
RunEventBus(the engine runs in-process on every surface, including the desktop WebView). Phase 2 — HTTPtext/event-stream(SSE) from the cloud API. - Related: ipc-contract.md, workflow-yaml-spec.md, ../shared-core/store-shapes.md, ../shared-core/llm-provider-seam.md (canonical
costMicrocentsunit + thecost:updatedfigures), ../../architecture/execution-model.md, ../../architecture/state-management.md, ADR-0018
Every workflow run produces a single ordered stream of RunEvent objects. This stream is the one contract that all surfaces consume to render live progress — streaming tokens on a node face, per-node status rings, cost waterfalls, and human-gate prompts. The events are emitted by @relavium/core and are identical regardless of where the engine runs.
The transport differs by surface and phase, but the event shape does not:
flowchart LR
E["@relavium/core\nRunEventBus\n(runs in-process on every surface)"] -->|in-process, WebView-side| D[Desktop WebView stores]
E -->|in-process bus| C[CLI ink renderer]
E -->|in-process bus| V[VS Code extension host]
E -. Phase 2 .->|HTTP SSE| P[Cloud Portal]
On the desktop the engine runs in the WebView's JS runtime (ADR-0018), so its RunEventBus and the consuming stores share one runtime — most run events never cross IPC. The only Rust→WebView channel on the LLM hot path is the delegated egress's Channel<StreamChunk> (the WebView adapter folds those chunks into agent:token run events locally); see ipc-contract.md. The cross-surface RunEvent union below is the same one HTTP SSE carries in Phase 2.
Every event extends a common base:
interface BaseEvent {
type: string; // discriminator (see table below)
runId?: string; // correlation key on a workflow RUN (omitted on a session)
sessionId?: string; // correlation key on an agent SESSION (omitted on a run)
timestamp: string; // ISO 8601
sequenceNumber: number; // monotonic per run OR per session
}Correlation key. Exactly one of
runId/sessionIdis present —runIdon a workflow run,sessionIdon an agent session. The reusedagent:token/agent:tool_call/agent:tool_result/cost:updatedevents carryrunIdon a run andsessionIdon a session; consumers route on whichever is present.
sequenceNumber is monotonic per run and is the basis for gap detection: if a consumer sees a jump in sequenceNumber, it triggers a full state resync (re-read the durable run state) rather than trusting a partial view. This is what makes reconnection lossless. The envelope fields (sessionId / runId, sequenceNumber, timestamp) are stamped by the bus, not the producer: WorkflowEngine emits through the RunEventBus, and AgentSession (1.V) emits envelope-free payload drafts through an injected SessionEventSink — wiring that sink onto the bus, where the per-session sequenceNumber (and its same gap/resync rule) is assigned, is 1.W. So a session's monotonic numbering is the bus's responsibility, not the session core's.
export type RunEvent =
| RunStartedEvent
| NodeStartedEvent
| AgentTokenEvent
| AgentToolCallEvent
| AgentToolResultEvent
| AgentFilePatchProposedEvent
| CostUpdatedEvent
| NodeCompletedEvent
| NodeFailedEvent
| NodeSkippedEvent
| NodeRetryingEvent
| MediaJobSubmittedEvent
| HumanGatePausedEvent
| HumanGateResumedEvent
| RunCompletedEvent
| RunFailedEvent
| RunCancelledEvent
| RunPausedEvent
| RunTimeoutEvent
| BudgetWarningEvent
| BudgetPausedEvent;
RunPausedEventis the multi-gate aggregate (below);RunTimeoutEvent/BudgetWarningEvent/BudgetPausedEventare the resource-governance events defined in Workflow governance and reserved events.
type |
Meaning | Key payload fields |
|---|---|---|
run:started |
A run began. | workflowId (the workflows.id UUID FK, not the authored slug — ADR-0022), inputs (secret-typed inputs masked — see Security), executionMode: 'local' | 'cloud' | 'managed' |
node:started |
A node began executing. | nodeId, nodeType, attemptNumber? (1-based; absent ⇒ attempt 1, present + >1 ⇒ a node-retry re-dispatch — 1.S) |
agent:token |
A streaming LLM token from an agent node. | nodeId, token, model |
agent:tool_call |
An agent invoked a tool. | nodeId, model (the invoking model — so a tool call is attributable across a failover), toolId, toolInput (sanitized — no secrets), attemptNumber? (1-based, matches cost:updated) |
agent:tool_result |
A tool returned. | nodeId, toolId, success, outputSummary (truncated for UI), attemptNumber? |
agent:file_patch_proposed |
An agent proposed a file change (gated — no write until the user accepts; e.g. the VS Code inline-diff review). | nodeId, patches: [{ uri, unifiedDiff }] (≥1 — an empty proposal is meaningless), attemptNumber? |
cost:updated |
A node's token cost was tallied (drives the cost waterfall). | nodeId, model, inputTokens, outputTokens, costMicrocents, cumulativeCostMicrocents (integer micro-cents — canonical unit in llm-provider-seam.md; includes realized media spend, folded as a disjoint addend per ADR-0044 §3 — the per-unit Usage.mediaUnits axis is not yet a field on this event, deferred, see deferred-tasks.md), attemptNumber? (1-based within-chain FallbackChain attempt — resets per node-retry re-dispatch; distinct from node:*.attemptNumber, see the two attemptNumber families note). Generative-node variant (1.AG Section C, ADR-0045 §5): a media_surface: 'generative' agent node emits exactly one cost:updated with inputTokens / outputTokens = 0 (no token billing — the spend rides entirely in costMicrocents as the per-modality media addend) and no attemptNumber (no FallbackChain on the generative path — one provider, no failover). |
node:completed |
A node finished successfully. | nodeId, output, tokensUsed: {input, output, model?} (model only for LLM nodes), durationMs, selected? (a condition's chosen target ids — the authoritative branch record checkpoint/resume restores from, 1.R; may be an empty array when the condition routes to no branch, dimming all downstream), attemptNumber? (1-based node-retry dispatch attempt — 1.S; absent ⇒ attempt 1) |
node:failed |
A node failed (TERMINAL — exactly one per node; emitted when the node-retry budget is exhausted, on a fatal / retry_on-excluded failure, or when a pending retry is abandoned by a cancel or a sibling abort — see 1.S). |
nodeId, error: {code, message, retryable, correlationId?} (code is an ErrorCode; correlationId is a secret-free id joined to the internal log — ADR-0036), attemptNumber? (the last attempt, when a retry budget was spent — 1.S) |
node:retrying |
A retryable node attempt failed and the engine will re-dispatch the whole node (1.S, ADR-0040) — non-terminal (the node continues; node:failed is the terminal). |
nodeId, attemptNumber (the attempt that just failed, 1-based), error: {code, message, retryable} (the NodeFailure shape — no correlationId; that anchors the terminal failure), delayMs (backoff before the next attempt) |
node:skipped |
A node was skip-propagated (never ran). | nodeId, reason: 'branch_not_taken' | 'upstream_unreachable' (branch_not_taken = a condition routed away from it; upstream_unreachable = every in-edge is dead because an upstream was skipped/failed). Emitted so the event log is a complete, replayable record — checkpoint/resume reconstructs a skipped vertex from it (run-plan.md) and a surface can render the dimmed path instead of the node silently vanishing. |
media_job:submitted |
An async media-generation job was submitted; the engine owns its poll/checkpoint/resume/cancel loop (1.AG, ADR-0045) — non-terminal (the node parks until its node:completed/node:failed). Durable so a crash-resume re-attaches (re-polls the opaque jobId) instead of re-submitting; per-poll progress is transient (off this durable stream). |
nodeId, jobId (Relavium-opaque — never the vendor op-name), provider, model, modality: 'image' | 'audio' | 'video', startedAt, deadlineAt |
human_gate:paused |
Execution suspended at a human gate. | nodeId, gateId, gateType: 'approval' | 'input' | 'review', message, assignee?, timeoutMs?, timeoutAction?: 'approve' | 'reject' (on-timeout policy, present only with timeoutMs), expiresAt? |
human_gate:resumed |
A gate decision was applied; execution continues. | nodeId, decision: 'approved' | 'rejected' | 'input_provided', decidedBy, payload? |
run:paused |
The run is suspended on ≥1 gate AND/OR ≥1 async media job — the multi-suspension aggregate (parallel branches may each reach a gate or a media job). pendingGateCount is the count of gateIds[] (they must agree) and both are 0/empty for a media-only park; pendingMediaJobNodeIds lists nodes parked on the engine-owned pollMediaJob loop (1.AG Section D, ADR-0045 §2). At least one suspension reason (a gate or a media job) always holds. A resume disambiguates by registry: a gate by gateId (a decision), a media job by nodeId (a re-attach). |
pendingGateCount, gateIds[], pendingMediaJobNodeIds[]? |
run:completed |
The run finished. | outputs (a record keyed by each terminal output vertex's node id, the value being that vertex's captured output — see run-plan.md §output capture), totalTokensUsed, totalCostMicrocents (integer micro-cents closing total for the whole run), durationMs |
run:failed |
The run failed. | error: {code, message, retryable, nodeId?, correlationId?} (code is an ErrorCode; nodeId is the root-cause node; correlationId joins to the internal log — ADR-0036), partialOutputs |
run:cancelled |
The run was cancelled. | (base only) |
attemptNumber appears on two independent counter families that must not be conflated (1.S, ADR-0040):
- Node-retry dispatch attempt — on
node:started/node:completed/node:failed/node:retrying. The engine's above-chain whole-node re-dispatch index. Absent ⇒ attempt 1; present + >1 ⇒ a re-dispatch (distinguishes "attempt N starting" from a replay). - Within-chain attempt — on
cost:updated/agent:tool_call/agent:tool_result/agent:file_patch_proposed. The within-chainFallbackChainattempt index inside a single node dispatch; it resets to 1 on every node-retry re-dispatch (a fresh chain runs each time).
The two do not join: on a node the budget retried, node:completed.attemptNumber may be 2 while the accompanying cost:updated.attemptNumber is 1. To attribute cost to a node-retry attempt, partition the sequenceNumber-ordered stream at each node:started / node:retrying boundary — do not key by (nodeId, attemptNumber) across families. (Run totals are unaffected: cost:updated.cumulativeCostMicrocents is the engine's authoritative running total.)
These TypeScript shapes are illustrative. The enforced, runtime-validated implementation is the Zod schema set in
@relavium/shared(run-event.ts), from which the TS types are inferred (ADR-0020). This document remains the canonical contract (the human-readable spec the schema implements); if the two ever diverge, this spec wins and the schema is corrected to it.
export interface AgentTokenEvent extends BaseEvent {
type: 'agent:token';
nodeId: string;
token: string; // streaming LLM token
model: string;
}
export interface CostUpdatedEvent extends BaseEvent {
type: 'cost:updated';
nodeId: string;
model: string; // canonical model id the cost was priced against
inputTokens: number;
outputTokens: number;
costMicrocents: number; // integer micro-cents (canonical unit defined in llm-provider-seam.md); this attempt, from Relavium's pricing table (never the provider)
cumulativeCostMicrocents: number; // integer micro-cents running total for the whole run — INCLUDES realized media spend, folded as a disjoint addend (ADR-0044 §3)
// NOTE (1.AF): the per-unit `Usage.mediaUnits` axis (image per-count, audio/video per-second; a token-based
// provider's audio rides as unit:'count') is NOT yet a field on this event. Realized media spend already
// folds into `cumulativeCostMicrocents`; surfacing the disjoint per-unit counts here needs `MediaUnitsEntry`
// relocated to `@relavium/shared` first (run-event.ts cannot import the `@relavium/llm` seam type). Deferred —
// see deferred-tasks.md.
attemptNumber?: number; // 1-based WITHIN-CHAIN attempt; resets per node-retry re-dispatch — distinct from node:*.attemptNumber (see "Two attemptNumber families")
}
export interface NodeCompletedEvent extends BaseEvent {
type: 'node:completed';
nodeId: string;
output: unknown;
// `model` is present only when an LLM produced the tokens. A non-agent node (condition,
// transform, merge, parallel, input, output, human_gate) completes with input/output 0 and
// no model — so `model` is optional.
tokensUsed: { input: number; output: number; model?: string };
durationMs: number;
selected?: string[]; // a `condition` node only: the immediate target ids it routed to (the live branches); MAY be empty when it routes to no branch (all downstream skip-propagated). The authoritative record checkpoint/resume restores `selectedTargets` from (1.R).
attemptNumber?: number; // 1-based NODE-RETRY dispatch attempt (1.S); absent ⇒ attempt 1 — distinct from cost:updated.attemptNumber (see "Two attemptNumber families")
}
export interface NodeSkippedEvent extends BaseEvent {
type: 'node:skipped';
nodeId: string;
reason: 'branch_not_taken' | 'upstream_unreachable';
}
export interface NodeRetryingEvent extends BaseEvent {
type: 'node:retrying'; // 1.S — a retryable attempt failed; the engine will re-dispatch the whole node. NON-TERMINAL.
nodeId: string;
attemptNumber: number; // the attempt that just failed (1-based); the next attempt is attemptNumber + 1
error: { code: ErrorCode; message: string; retryable: boolean }; // the NodeFailure shape — no correlationId (that anchors the terminal node:failed)
delayMs: number; // backoff before the next attempt
}
export interface MediaJobSubmittedEvent extends BaseEvent {
type: 'media_job:submitted'; // 1.AG/ADR-0045 §2 — an async media job was submitted; the node PARKS (non-terminal suspension). DURABLE (resume re-attaches).
nodeId: string;
jobId: string; // the Relavium-opaque job id the engine re-polls — never the vendor operation-name (ADR-0011 I1)
provider: 'anthropic' | 'openai' | 'gemini' | 'deepseek'; // the bound LlmProviderId (closed z.enum(LLM_PROVIDERS); failover does not apply to an in-flight job)
model: string; // canonical model id
modality: 'image' | 'audio' | 'video';
startedAt: string; // ISO-8601 submit time
deadlineAt: string; // ISO-8601 = startedAt + [defaults].media_job_deadline_ms; on resume now > deadlineAt short-circuits a doomed re-poll
}
export interface HumanGatePausedEvent extends BaseEvent {
type: 'human_gate:paused';
nodeId: string;
gateId: string; // stable id of this gate instance; required by the resume path — engine.resume(runId, gateId, decision)
gateType: 'approval' | 'input' | 'review';
message: string;
assignee?: string;
timeoutMs?: number;
timeoutAction?: 'approve' | 'reject'; // on-timeout policy (present only with timeoutMs); lets a surface show how the gate auto-resolves and a Phase-2 crash-resume re-arm the timer from the log
expiresAt?: string;
}
export interface BudgetWarningEvent extends BaseEvent {
type: 'budget:warning';
spentMicrocents: number;
limitMicrocents: number;
thresholdPct: number; // 0–100, rounded from spent/limit at the pre-egress check point
}
export interface BudgetPausedEvent extends BaseEvent {
type: 'budget:paused';
nodeId: string; // the agent node whose next LLM call would exceed the cap
spentMicrocents: number;
limitMicrocents: number;
gateId: string; // stable id of the budget gate; required by engine.resume(runId, gateId, decision)
}
export interface RunTimeoutEvent extends BaseEvent {
type: 'run:timeout';
elapsedMs: number;
timeoutMs: number;
}agent:tool_call.toolInput is sanitized (no secrets) and agent:tool_result.outputSummary is truncated. run:started.inputs carries workflow inputs, but any secret-typed input is masked — the value is replaced with { secret: true, ref } (the keychain/env reference), never the raw value. API keys and other secrets never appear in any event payload — this holds across the in-process bus, HTTP SSE, and any persisted run log. (On the desktop the raw provider key never even reaches the WebView: egress is Rust-delegated, ADR-0018.)
The same { secret: true, ref } MaskedSecret marker can also appear in node:completed.output (for an input node, which emits the masked inputs) and therefore in run:completed.outputs / run:failed.partialOutputs wherever a secret-typed input would otherwise surface — the engine masks secret inputs at the ingress so a raw secret never reaches an output payload (see run-plan.md §output capture). Any surface rendering of node/run outputs must treat a MaskedSecret object as a redacted placeholder, not displayable data.
The consumer pattern is identical for every surface, local or cloud:
const handle = engine.start(workflowId, inputs);
for await (const event of handle.events) {
switch (event.type) {
case 'agent:token': renderStreamingToken(event.nodeId, event.token); break;
case 'node:completed': markNodeDone(event.nodeId, event.tokensUsed); break;
case 'human_gate:paused': showApprovalUI(event); break;
case 'run:completed': showResult(event.outputs); break;
}
}On the desktop the same events are produced and consumed WebView-side over the engine's in-process RunEventBus (they do not cross IPC) — see ipc-contract.md. On the cloud portal (Phase 2) they arrive over HTTP SSE. In all cases the consumer routes by nodeId into the per-node status map in runStore (kept deliberately separate from the canvas store to avoid re-rendering ReactFlow on every token — see ../shared-core/store-shapes.md).
A human gate threads two events through the stream around a suspension:
- Engine reaches a
human_gatenode, persists full run state, emitshuman_gate:pausedcarrying thegateId, and suspends — the process may even exit. - A surface renders the approval UI and the user acts; the surface calls
engine.resume(runId, gateId, decision), passing back thegateIdit received on the paused event (it identifies which gate is being resolved). - The engine reloads state, emits
human_gate:resumed, and the run continues.
The gate decision object:
export interface GateDecision {
decision: 'approved' | 'rejected' | 'input_provided';
decidedBy: string; // user id, or 'timeout' when a gate auto-resolves on timeout
payload?: unknown; // for gate_type = input
comment?: string;
}Timeout behavior (timeout_action on the node) maps to decidedBy: 'timeout' when a gate auto-resolves. The timeout_action: escalate value is reserved in v1.0 (a timeout resolves only as approve or reject); see workflow-yaml-spec.md.
An agent session (ADR-0024) is driven on the same RunEventBus, but emits a disjoint session:* namespace keyed by sessionId instead of runId. Consumers route purely on the type discriminant, so the two namespaces never collide.
interface BaseSessionEvent {
type: string; // 'session:*' (see below)
sessionId: string;
timestamp: string; // ISO 8601
sequenceNumber: number; // monotonic per session — same gap-detection/resync rule as a run
}
export type SessionEvent =
| SessionStartedEvent // 'session:started' — { agentRef, model, context }
| SessionTurnStartedEvent // 'session:turn_started' — a user message began an assistant turn
| SessionTurnCompletedEvent // 'session:turn_completed' — { stopReason, tokensUsed, error? }
| SessionCancelledEvent // 'session:cancelled' — the in-flight turn was aborted
| SessionExportedEvent; // 'session:exported' — { workflowPath } (chat-to-workflow export)A turn that fails (a provider error, a rate limit, an exhausted budget cap) still emits session:turn_completed with an error?: { code, message, retryable, correlationId? } — the same closed ErrorCode taxonomy and secret-free correlation id as run events (ADR-0036) — so a surface can render the failure rather than a silent stall. A cancellation is distinct: it emits session:cancelled (not turn_completed) and the in-flight user message is rolled back from the transcript, so a cancelled turn leaves no partial assistant turn behind (see agent-session-spec.md).
Within a turn, the conversational work reuses the same agent:token / agent:tool_call / agent:tool_result / cost:updated event shapes the AgentRunner already emits — carried on the session envelope (sessionId). The per-turn append of user/assistant/tool messages is persisted as session_messages (see database-schema.md); the contract is owned by agent-session-spec.md. On every surface session events are produced and consumed in-process exactly like run events — only llm_stream crosses IPC on the desktop (ipc-contract.md). So the complete typed event stream for a session is the five session:* lifecycle events (the SessionEvent union above) plus agent:token / agent:tool_call / agent:tool_result / cost:updated carrying sessionId — this full set is exactly what relavium chat --json emits.
The session stream (SessionHandle, 1.W). A session is long-lived across turns, so — unlike a run's exactly-one-terminal RunHandle — the SessionHandle.events async-iterable stays open across turns: session:turn_completed is a per-turn boundary, not a stream terminal. The stream closes only on session:cancelled (the session's sole terminal); session:exported is a side event (1.Z), never a terminal. The bus assigns the per-session sequenceNumber — a monotonic counter keyed on sessionId, independent of any run's runId counter on the same shared bus (ADR-0036 "one bus, two namespaces") — with the same gap-detection / resync rule as a run. AgentSession (1.V) emits envelope-free drafts through its injected SessionEventSink; 1.W's createSessionEventSink attaches the sessionId and the bus stamps the sequenceNumber + timestamp at the one authoritative translation point. The bus's validation gate accepts both families via the combined RunOrSessionEventSchema (@relavium/shared). agent:file_patch_proposed is run-only (it carries runId, emitted by the AgentRunner workflow adapter — not the shared turn core), so it is not part of a session stream; createSessionEventSink drops it defensively at the seam.
@relavium/core resource governance (ADR-0028) adds three run events:
type |
Meaning | Key payload fields |
|---|---|---|
budget:warning |
Pre-egress worst-case cost estimate would exceed the configured cap, and on_exceed: warn is set. Emitted once per run before the capped egress; execution continues. thresholdPct is clamp(round(spent / limit * 100), 0, 100) observed at the pre-egress check point. |
spentMicrocents, limitMicrocents, thresholdPct |
budget:paused |
Pre-egress estimate would exceed the cap with on_exceed: pause_for_approval; the run suspends like a human gate and is resumed via engine.resume(runId, gateId, decision). decision: approved continues; rejected closes the run with run:failed{code: budget_exceeded}. |
nodeId, spentMicrocents, limitMicrocents, gateId |
run:timeout |
The run hit its timeout_ms. |
elapsedMs, timeoutMs |
These three (and run:paused / human_gate:paused) are non-terminal — they signal a governance/suspension state, not the run's end. A run that cannot continue past a timeout or budget cap still closes with exactly one run:failed carrying code: run_timeout / budget_exceeded. The exactly-one-terminal-event invariant (run:completed | run:failed | run:cancelled) and its precedence are owned by ADR-0036.
Reserved (declared, but emitted by no Phase-1 code):
- Loops (loops ADR, 0030+):
iteration:started/iteration:completed, and an optionaliterationIndex?/iterationTotal?on node-level events. Reserved so the schema is future-proof without Phase-1 bloat. - Steering (agent-sessions.md):
agent:directive_injected(mode: 'non_blocking' | 'blocking',directiveLength— not the content, so no secret/PII enters the stream),agent:context_compacted,agent:context_cleared. Security envelope: a directive applies only to a running or paused agent; completed nodes are immutable.
node:failed.error.code and run:failed.error.code are a closed ErrorCode enum (not a free string), so surfaces can branch on cause and retryable is unambiguous:
validation · content_filter · provider_auth · provider_rate_limit · provider_unavailable · tool_denied · tool_failed · budget_exceeded · run_timeout · turn_limit · cancelled · sandbox_error · internal
The retryable/fatal mapping is owned by error-handling.md (e.g. provider_rate_limit/provider_unavailable retryable; provider_auth/validation/content_filter/tool_denied/turn_limit/cancelled fatal). content_filter is a provider content-policy rejection (text or media generation) — a fatal cause distinct from validation (an authoring/shape error), so a surface shows the right reason; the content_filter LlmErrorKind maps here (1.AG, ADR-0045 §6). turn_limit is the limit-family code for a hard agent/session turn/round cap (the exact knob is settled with AgentSession, 1.V) — distinct from run_timeout/budget_exceeded so a capped conversation surfaces its own cause rather than a silent stop; continuing past it is an explicit user action, never a retry. It is not the [chat].max_messages knob, which is a session-history trim threshold (config-spec.md) — trimming continues the session and emits no error. Messages remain user-safe and secret-free.
This schema is versioned by additive evolution, not a version field. The following are always v1.0-legal and never a breaking change, provided consumers ignore unknown types and unknown fields and treat an absent optional field as omitted (not null):
- adding a new optional field to an existing event;
- adding a new event
type(including activating any reserved type above).
Removing or repurposing an existing field/type is a breaking change and is not done within the contract.
- Desktop: the engine runs in the WebView's JS runtime (ADR-0018), so run events are delivered WebView-side over the engine's in-process
RunEventBus— they do not cross IPC asRunEvents. The one Rust→WebView channel on the hot path is the delegated LLM egress's typed, backpressure-awareChannel<StreamChunk>: if the WebView consumer lags, the Rust sender awaits, throttling the egress without dropping chunks; the adapter folds those chunks intoagent:tokenevents on the WebView-side bus. See ipc-contract.md. - CLI / VS Code: the engine runs in-process; events are delivered via the engine's
RunEventBus(a platform-free, in-house typed event bus — not Node'snode:events; ADR-0036) or the co-equalRunHandle.eventsasync iterable.
The cloud API exposes the same stream as Server-Sent Events. Reconnection uses sequenceNumber (and SSE Last-Event-ID) for gap detection and resync against durable run state. A singleton SseManager owns the EventSource lifecycle with exponential-backoff reconnect (500ms → 1s → 2s → 4s, cap 30s) and a GET /runs/:id/state resync on reconnect.
Legacy event-name note. Earlier design drafts used dotted event names (
node.started,node.token,node.completed,node.error,run.complete,human_gate.pending,cost.update) with a{ type, nodeId, payload, seqNo }envelope. The canonical contract going forward is the colon-namespacedRunEventunion above withsequenceNumber. New code targets the union; the dotted names are recorded here only to disambiguate older references.