You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When @workflow/core runs against a @workflow/world* package from an older major line, a durable run executes its first step, the runtime replays the event log, and the world's event-schema Zod discriminated union hits an event-type discriminant it doesn't know about — throwing ZodError: invalid_union deep inside the world's storage layer. The error points nowhere useful: there is no version handshake or compatibility check between core and the installed world, so a routine version mismatch surfaces as an opaque schema crash at replay time rather than an actionable "your world is too old" message. The SDK already has the building blocks for a clean diagnosis (spec-version.ts, requiresNewerWorld() + RunNotSupportedError, and the getRunCapabilities() version table) — they just don't cover this direction of mismatch.
Repro
@workflow/core@5.0.0-beta.24 (as bundled by vercel/eve@0.13.3)
@workflow/world-postgres@4.2.0 — this is what pnpm add @workflow/world-postgres installs today, because the npm latest dist-tag still points at the 4.x line:
Steps: start any durable workflow and let it run past its first step. The 5.x runtime writes a new-style event, the run replays its event log, and the 4.x world's EventSchema.parse(...) throws ZodError: invalid_union during replay.
Root cause
The world owns the event-schema discriminated union and validates every event it reads from storage through it:
The union is defined in packages/world/src/events.ts — EventTypeSchema (the z.enum of all event types, events.ts:57-80), AllEventsSchema (z.discriminatedUnion('eventType', [...]), events.ts:386-408), and the exported EventSchema = AllEventsSchema.and(...) (events.ts:412-420). The event-type vocabulary has grown over time — e.g. attr_set was added in Add native v4 workflow attribute events #2226 ("Add native v4 workflow attribute events"), and step_started carries lazy-start semantics added later (perf(core): lazy inline step start (save one world round-trip per step) #2478). A world pinned to an older @workflow/world has an older EventTypeSchema/AllEventsSchema whose union does not include these discriminants.
The world calls EventSchema.parse(...) unconditionally on every event read/return path. In world-postgres see packages/world-postgres/src/storage.ts:312, 630, 1454, 1675, 1691, 1726, 1757, 1791. When an event whose eventType isn't in the older union flows through any of these, z.discriminatedUnion rejects it with invalid_union — there is no eventType-specific branch to match, and the failure is a raw ZodError, not a workflow error.
The core↔world boundary itself carries no world-vocabulary version signal:
The World interface (packages/world/src/interfaces.ts:276-366) exposes specVersion?: number (interfaces.ts:286), but that is a forward marker: it's the spec version core writes new runs at (packages/core/src/runtime/start.ts:283-290). It is not a declaration of which event-type vocabulary / schema version the installed world can parse.
The only existing compatibility guard is requiresNewerWorld(run.specVersion) (packages/world/src/spec-version.ts:58-68), thrown as RunNotSupportedError (packages/errors/src/index.ts:826-846). But (a) it keys off the numeric run.specVersion, which is only reached after the event has already been parsed through the union, and (b) the guard itself only exists in 5.x worlds (packages/world-postgres/src/storage.ts:559-564, packages/world-local/src/storage/events-storage.ts:650-651). A 4.x world predates the guard and parses each event first, so it never gets the chance to report a clean version error — it dies on the Zod union.
A grep for coreApiVersion|worldVersion|schemaVersion|assertCompatible|EVENT_SCHEMA_VERSION across packages/**/src returns nothing: there is no handshake, capability negotiation, or schema-version marker exchanged between core and a world at setWorld/registration/start() time. Notably, the SDK already has a precedent for exactly this kind of negotiation on the core↔core (cross-deployment) boundary — getRunCapabilities() + the FORMAT_VERSION_TABLE/CAPABILITY_VERSION_TABLE keyed on @workflow/core version (packages/core/src/capabilities.ts:1-90, consumed in start.ts:267) — but nothing equivalent exists for the world's event vocabulary.
There is even prior art showing the maintainers already know unknown event types crash the runtime: world-vercel deliberately uses safeParse with an explicit "unknown/future event types" pass-through fallback (packages/world-vercel/src/events.ts:386-402, coerceEventDates), and the legacy postgres path throws an actionable Event type 'X' not supported ... Please upgrade @workflow packages. (packages/world-postgres/src/storage.ts:321-326). The hot replay path in world-postgres/world-local just doesn't get the same treatment.
Impact
Opaque failure with no version signal: a ZodError: invalid_union originating inside the world's storage layer gives no hint that the cause is a core/world version mismatch.
This is the default outcome of following install instructions, not an edge case: while @workflow/core ships on the 5.x beta line, the latest dist-tag for @workflow/world, @workflow/world-postgres, and @workflow/world-local is still 4.2.0. Anyone self-hosting who runs pnpm add @workflow/world-postgres against a 5.x core gets the broken combination automatically.
High debugging cost: the symptom is far from the cause, and it cost a self-hoster the majority of their debugging time before the mismatch was identified.
The dist-tag lag is itself worth fixing (so latest doesn't hand people an incompatible world), but core should fail safely regardless — dist-tag hygiene alone won't protect users who pin, use private registries, or otherwise end up with a mismatched world.
Proposed fix
Three grounded approaches, roughly in order of robustness. They are complementary, not mutually exclusive.
Declared world schema/vocabulary version, validated by core at registration/start() (preferred). Add a World-level declaration of the event-schema vocabulary the world understands — e.g. a worldSpecVersion / supportedEventTypes exported alongside SPEC_VERSION_CURRENT and surfaced on the World interface (packages/world/src/interfaces.ts:276, next to the existing specVersion). When core sets/starts a world it compares its own SPEC_VERSION_CURRENT / event vocabulary (packages/world/src/spec-version.ts:39, EventTypeSchema in events.ts:57) against the world's declared value and fails fast with an actionable error if the world is too old, before any run executes. This mirrors the existing getRunCapabilities() negotiation (packages/core/src/capabilities.ts) but for the world boundary instead of the cross-deployment one. Tradeoff: requires worlds to publish the field, so it only fully protects against worlds new enough to declare it — but combined with Version Packages (beta) #3 it also covers the silent-old-world case.
Validate the installed world package range at registration time. Since worlds depend on @workflow/world (workspace:* in-repo; a real semver range when published) but declare no relationship to @workflow/core, core can't currently reason about compatibility. Add a peer/declared compatibility range between @workflow/core and @workflow/world (or have core read the resolved @workflow/world version) and assert it at boot, emitting an explicit "core X requires world >= Y, found Z" error. Tradeoff: package-version checks are coarser than wire-level checks and can be fooled by hoisting/duplicate installs, but they catch the common case at the earliest possible point (install/boot) with a clear message.
Make the event-schema union diagnose unknown discriminators instead of throwing invalid_union. At the world's parse sites (packages/world-postgres/src/storage.ts:312 et al., packages/world-local/src/storage/events-storage.ts), switch the hot read path to safeParse and, on an unknown-eventType failure, throw a version-aware error — e.g. "event type X requires @workflow/world >= Y; installed world is Z" — reusing/extending RunNotSupportedError (packages/errors/src/index.ts:826). This is the same shape world-vercel already implements (packages/world-vercel/src/events.ts:386-402) and the legacy-postgres path's actionable message (world-postgres/src/storage.ts:321-326); it's the only approach that helps when the world is the old one and predates any handshake field. Tradeoff: it's a per-world change rather than one central guard, and care is needed to distinguish "unknown future event type" (version mismatch) from "known type, malformed payload" (a real bug) — world-vercel's code already draws that line by re-checking EventTypeSchema.safeParse(raw.eventType).
Workarounds today
Pin the world to the same @workflow/* release line as core, e.g. for a 5.x beta core:
pnpm add @workflow/world-postgres@beta # or an explicit @5.0.0-beta.x
Note re: vercel/eve
vercel/eve is adding docs guidance plus a shallow boot-time guard (detecting the mismatch and emitting an actionable message) as a stopgap on the consumer side. That helps eve users, but the durable, framework-wide fix belongs in @workflow/core/@workflow/world: core should detect or clearly diagnose an incompatible world rather than letting a ZodError: invalid_union escape from replay.
Summary
When
@workflow/coreruns against a@workflow/world*package from an older major line, a durable run executes its first step, the runtime replays the event log, and the world's event-schema Zod discriminated union hits an event-type discriminant it doesn't know about — throwingZodError: invalid_uniondeep inside the world's storage layer. The error points nowhere useful: there is no version handshake or compatibility check between core and the installed world, so a routine version mismatch surfaces as an opaque schema crash at replay time rather than an actionable "your world is too old" message. The SDK already has the building blocks for a clean diagnosis (spec-version.ts,requiresNewerWorld()+RunNotSupportedError, and thegetRunCapabilities()version table) — they just don't cover this direction of mismatch.Repro
@workflow/core@5.0.0-beta.24(as bundled byvercel/eve@0.13.3)@workflow/world-postgres@4.2.0— this is whatpnpm add @workflow/world-postgresinstalls today, because the npmlatestdist-tag still points at the 4.x line:Steps: start any durable workflow and let it run past its first step. The 5.x runtime writes a new-style event, the run replays its event log, and the 4.x world's
EventSchema.parse(...)throwsZodError: invalid_unionduring replay.Root cause
The world owns the event-schema discriminated union and validates every event it reads from storage through it:
The union is defined in
packages/world/src/events.ts—EventTypeSchema(thez.enumof all event types,events.ts:57-80),AllEventsSchema(z.discriminatedUnion('eventType', [...]),events.ts:386-408), and the exportedEventSchema = AllEventsSchema.and(...)(events.ts:412-420). The event-type vocabulary has grown over time — e.g.attr_setwas added in Add native v4 workflow attribute events #2226 ("Add native v4 workflow attribute events"), andstep_startedcarries lazy-start semantics added later (perf(core): lazy inline step start (save one world round-trip per step) #2478). A world pinned to an older@workflow/worldhas an olderEventTypeSchema/AllEventsSchemawhose union does not include these discriminants.The world calls
EventSchema.parse(...)unconditionally on every event read/return path. Inworld-postgresseepackages/world-postgres/src/storage.ts:312, 630, 1454, 1675, 1691, 1726, 1757, 1791. When an event whoseeventTypeisn't in the older union flows through any of these,z.discriminatedUnionrejects it withinvalid_union— there is noeventType-specific branch to match, and the failure is a rawZodError, not a workflow error.The core↔world boundary itself carries no world-vocabulary version signal:
The
Worldinterface (packages/world/src/interfaces.ts:276-366) exposesspecVersion?: number(interfaces.ts:286), but that is a forward marker: it's the spec version core writes new runs at (packages/core/src/runtime/start.ts:283-290). It is not a declaration of which event-type vocabulary / schema version the installed world can parse.The only existing compatibility guard is
requiresNewerWorld(run.specVersion)(packages/world/src/spec-version.ts:58-68), thrown asRunNotSupportedError(packages/errors/src/index.ts:826-846). But (a) it keys off the numericrun.specVersion, which is only reached after the event has already been parsed through the union, and (b) the guard itself only exists in 5.x worlds (packages/world-postgres/src/storage.ts:559-564,packages/world-local/src/storage/events-storage.ts:650-651). A 4.x world predates the guard and parses each event first, so it never gets the chance to report a clean version error — it dies on the Zod union.A grep for
coreApiVersion|worldVersion|schemaVersion|assertCompatible|EVENT_SCHEMA_VERSIONacrosspackages/**/srcreturns nothing: there is no handshake, capability negotiation, or schema-version marker exchanged between core and a world atsetWorld/registration/start()time. Notably, the SDK already has a precedent for exactly this kind of negotiation on the core↔core (cross-deployment) boundary —getRunCapabilities()+ theFORMAT_VERSION_TABLE/CAPABILITY_VERSION_TABLEkeyed on@workflow/coreversion (packages/core/src/capabilities.ts:1-90, consumed instart.ts:267) — but nothing equivalent exists for the world's event vocabulary.There is even prior art showing the maintainers already know unknown event types crash the runtime:
world-verceldeliberately usessafeParsewith an explicit "unknown/future event types" pass-through fallback (packages/world-vercel/src/events.ts:386-402,coerceEventDates), and the legacy postgres path throws an actionableEvent type 'X' not supported ... Please upgrade @workflow packages.(packages/world-postgres/src/storage.ts:321-326). The hot replay path inworld-postgres/world-localjust doesn't get the same treatment.Impact
ZodError: invalid_unionoriginating inside the world's storage layer gives no hint that the cause is a core/world version mismatch.@workflow/coreships on the 5.x beta line, thelatestdist-tag for@workflow/world,@workflow/world-postgres, and@workflow/world-localis still4.2.0. Anyone self-hosting who runspnpm add @workflow/world-postgresagainst a 5.x core gets the broken combination automatically.latestdoesn't hand people an incompatible world), but core should fail safely regardless — dist-tag hygiene alone won't protect users who pin, use private registries, or otherwise end up with a mismatched world.Proposed fix
Three grounded approaches, roughly in order of robustness. They are complementary, not mutually exclusive.
Declared world schema/vocabulary version, validated by core at registration/
start()(preferred). Add aWorld-level declaration of the event-schema vocabulary the world understands — e.g. aworldSpecVersion/supportedEventTypesexported alongsideSPEC_VERSION_CURRENTand surfaced on theWorldinterface (packages/world/src/interfaces.ts:276, next to the existingspecVersion). When core sets/starts a world it compares its ownSPEC_VERSION_CURRENT/ event vocabulary (packages/world/src/spec-version.ts:39,EventTypeSchemainevents.ts:57) against the world's declared value and fails fast with an actionable error if the world is too old, before any run executes. This mirrors the existinggetRunCapabilities()negotiation (packages/core/src/capabilities.ts) but for the world boundary instead of the cross-deployment one. Tradeoff: requires worlds to publish the field, so it only fully protects against worlds new enough to declare it — but combined with Version Packages (beta) #3 it also covers the silent-old-world case.Validate the installed world package range at registration time. Since worlds depend on
@workflow/world(workspace:*in-repo; a real semver range when published) but declare no relationship to@workflow/core, core can't currently reason about compatibility. Add a peer/declared compatibility range between@workflow/coreand@workflow/world(or have core read the resolved@workflow/worldversion) and assert it at boot, emitting an explicit "core X requires world >= Y, found Z" error. Tradeoff: package-version checks are coarser than wire-level checks and can be fooled by hoisting/duplicate installs, but they catch the common case at the earliest possible point (install/boot) with a clear message.Make the event-schema union diagnose unknown discriminators instead of throwing
invalid_union. At the world's parse sites (packages/world-postgres/src/storage.ts:312et al.,packages/world-local/src/storage/events-storage.ts), switch the hot read path tosafeParseand, on an unknown-eventTypefailure, throw a version-aware error — e.g. "event typeXrequires@workflow/world>=Y; installed world isZ" — reusing/extendingRunNotSupportedError(packages/errors/src/index.ts:826). This is the same shapeworld-vercelalready implements (packages/world-vercel/src/events.ts:386-402) and the legacy-postgres path's actionable message (world-postgres/src/storage.ts:321-326); it's the only approach that helps when the world is the old one and predates any handshake field. Tradeoff: it's a per-world change rather than one central guard, and care is needed to distinguish "unknown future event type" (version mismatch) from "known type, malformed payload" (a real bug) —world-vercel's code already draws that line by re-checkingEventTypeSchema.safeParse(raw.eventType).Workarounds today
Pin the world to the same
@workflow/*release line as core, e.g. for a 5.x beta core:pnpm add @workflow/world-postgres@beta # or an explicit @5.0.0-beta.xNote re:
vercel/evevercel/eveis adding docs guidance plus a shallow boot-time guard (detecting the mismatch and emitting an actionable message) as a stopgap on the consumer side. That helps eve users, but the durable, framework-wide fix belongs in@workflow/core/@workflow/world: core should detect or clearly diagnose an incompatible world rather than letting aZodError: invalid_unionescape from replay.🤖 Generated with Claude Code