Skip to content

Conversation

bergundy
Copy link
Member

@bergundy bergundy commented May 8, 2025

What changed

This adds support for Nexus in the Temporal TS SDK.

All Nexus-related APIs are experimental, and some changes are expected before this reaches GA.

Review notes

  • The team is still discussing expectations regarding HandlerError and OperationError serialization. In the mean time, this PR mostly adheres to the original spec (i.e. a Nexus should have either a message or a cause), but little effort has been put into compliance testing, as we know the present model is broken in certain aspects.
  • Tests use the HTTP API directly in lieu of a workflow caller or strongly typed client, we can refactor those later.
  • More tests will continue to be added in the near future, after this PR.

@bergundy bergundy requested a review from a team as a code owner May 8, 2025 01:26
test.beforeEach(async (t) => {
const taskQueue = t.title + randomUUID();
const { env } = t.context;
const response = await env.connection.operatorService.createNexusEndpoint({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is the way for now, but what's the intent regarding this? Do we expect users to have to do the same in their own Temporal+Nexus tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. in Java we have a shortcut, we should probably make it easier in the test environment for Core based SDKs. @dandavison, wondering if you considered this in Python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I have not considered APIs for user Temporal Nexus testing yet; I'll add a TODO to the code. In my tests I have a helper function like this for creating Nexus endpoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need an issue to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Created one for Python SDK-3837 "Python Nexus user testing utilities and docs"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have made this a features issue with checkboxes for each SDK but 🤷, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be added post-initial-release.

options: nexus.StartOperationOptions
): Promise<coresdk.nexus.INexusTaskCompletion> {
try {
const input = await decodeFromPayload(this.dataConverter, payload);
Copy link
Contributor

@dandavison dandavison May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data converter fails here, I assume it should result in a 400 BAD_REQUEST Nexus HTTP response, right? If that's correct can you confirm that it does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a test worth adding. Thanks!

bergundy added 8 commits May 14, 2025 20:02
- This PR is blocked on publishing an initial version of the `nexus-rpc` package.
- There is a TODO to figure out HandlerError and OperationError message rehydration, pending discussion.
- Interceptors not yet implemented.
- `WorkflowRunOperation` and `getClient()` not implemented for the `@temporalio/nexus` package.
- Tests use the HTTP API directly in lieu of a workflow caller or strongly typed client, we can refactor those later.
}
if (!token.wid) {
throw new TypeError('invalid workflow run token: missing workflow ID (wid)');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Go and TS don't validate the 'ns' field. Is that deliberate?

@mjameswh mjameswh changed the base branch from nexus-cp-3 to main August 21, 2025 18:41
@mjameswh mjameswh changed the title feat(nexus): Nexus support feat(nexus): Initial Nexus support Aug 21, 2025
Copy link
Contributor

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving this for now, but haven't finished my review

*
* @experimental Nexus support in Temporal SDK is experimental.
*/
export interface StartNexusOperationInput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever user-facing? Do the fields need docstrings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's user facing, and yes this could benefit from some doc strings.

But we don't have any on other interceptors input/output types, so I'll just skip for the moment.

Comment on lines +174 to +181
const completed =
!activator.completions.nexusOperationStart.has(seq) &&
!activator.completions.nexusOperationComplete.has(seq);

if (!completed) {
activator.pushCommand({
requestCancelNexusOperation: { seq },
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just check nexusOperationComplete ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because cancellation could also be requested for an operation that is still in the "Start" state.

Copy link
Contributor

@THardy98 THardy98 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh 🤦
What is the result of cancelling a completed operation then? It's already completed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • nexusOperationStart => waiting for the operation to be started on the remote nexus handler service.
  • nexusOperationComplete => waiting for the operation to be completed asynchronously by the remote nexus handler.

So while the op is in nexusOperationComplete, it is not yet completed, it is pending completion, and so it can still be cancelled.

/**
* A promise that will resolve when the Nexus operation completes, either with the result of the
* operation if it completed successfully, or a failure if the Nexus Operation failed.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW - when it comes to cancellation types this comment will be technically incomplete/inaccurate: under WaitRequested the promise will resolve with a cancellation error if the cancel handler succeeds, and it will resolve with a non-cancellation error if the cancel handler fails, in both cases regardless of the operation outcome.

nit: decide on capitalization of [Oo]peration and [N]nexus and make docstrings consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: decide on capitalization of [Oo]peration and [N]nexus and make docstrings consistent.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW - when it comes to cancellation types this comment will be technically incomplete/inaccurate: under WaitRequested the promise will resolve with a cancellation error if the cancel handler succeeds, and it will resolve with a non-cancellation error if the cancel handler fails, in both cases regardless of the operation outcome.

Thanks for the heads-up.


/**
* A fixed, single-line summary for this Nexus Operation that may appear in the UI/CLI.
* This can be in single-line Temporal markdown format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I supposed to know what "Temporal markdown format" is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the normal Markdown elements except we strip away images, hmtl, and scripts.

Maybe it's worth clarifying that in the docstring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

Unfortunately, we don't yet have a doc page to link to, which is ok at this point as User Metadata is still considered prerelease.

And FWIW, that's the same typedoc text as we have in other SDKs.

service: options.service.name,
operation: opName,
options: operationOptions ?? {},
nexusHeader: {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to make it possible for the user to pass headers now, or leave an open issue for it?

Copy link
Contributor

@dandavison dandavison Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I know Go uses the (weird) singular for "header"; does TS also? Python uses "headers".

Copy link
Contributor

@mjameswh mjameswh Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I know Go uses the (weird) singular for "header"; does TS also? Python uses "headers".

Good catch. Renamed to headers everywhere (no nexus prefix, plural form). That's coherent with both Python and other TS interceptors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to make it possible for the user to pass headers now, or leave an open issue for it?

Not sure what's the expectation here. For all other APIs, headers are not exposed directly in the "main" user facing call, only in interceptors. Are doing this differently for Nexus ops? If that's only in interceptors, then that should already be working now (though no test atm covering that).

In any case, I'm leaving that for later.

Comment on lines 17 to 18
* Start a Nexus Operation and wait for its completion taking a {@link nexus.operation}.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Start a Nexus Operation and wait for its completion taking a {@link nexus.operation}.
*/
* Start a Nexus Operation and wait for its completion taking a {@link nexus.operation}.
* Returns the operation's result.
*/

Copy link
Contributor

@dandavison dandavison Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth giving users some guidance in the docstrings regarding which of the two overloads (operation vs operation name) they should use?

Maybe dumb question: doesn't the string overload always provide better type safety? The structural typing (operation) overload allows a user to refer to an operation that has a name that does not match a name on the service for which the client was created, giving rise to a request-time HandlerError I believe e.g.

const service = nexus.service('test', {
  syncOp: nexus.operation<string, string>({ name: 'my-sync-op' }),
  asyncOp: nexus.operation<string, string>(),
});

const service2 = nexus.service('test', {
  syncOp2: nexus.operation<string, string>({ name: 'my-sync-op' }),
  asyncOp2: nexus.operation<string, string>(),
});

export async function caller(endpoint: string, op: keyof typeof service.operations, action: string): Promise<string> {
  const client = workflow.createNexusClient({
    endpoint,
    service,
  });
  return await workflow.CancellationScope.cancellable(async () => {
    let opRef = service2.operations['syncOp2'];
    const handle = await client.startOperation(opRef, action);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dumb question: doesn't the string overload always provide better type safety? The structural typing (operation) overload allows a user to refer to an operation that has a name that does not match a name on the service for which the client was created

No, both provide the same level of type safety. I just added a test to demonstrate this.

Would it be worth giving users some guidance in the docstrings regarding which of the two overloads (operation vs operation name) they should use?

We have no opinion on that question. We intentionally chose to allow both.

I however expanded the typedoc to clarify that this expects the operation's property's name, rather than the operation's name. That may sound a bit confusing at first, but in practice, I'd expect most users will either not override operation names, or they will rely on their IDE's autocompletion, or TSC's compilation errors, to point out incorrect usage. Only pure-JS users (with no implicit TS IDE support) and Nexus framework implementers would really be exposed to this notion.

Copy link
Contributor

@dandavison dandavison Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both provide the same level of type safety.

Using an operation method reference, users can do this without a type error:

await service1client.startOperation(service2.operations.syncOp2, action)

That would result in a request time error. I.e. it would send a request for service='service1 and operation='syncOp2', which is invalid.

But using string operation names they cannot represent that invalid nexus request. For example, service1client.startOperation('syncOp2', action) would be a type-check-time error.

My question is: given that operation method references are less type safe in this sense, what value do they add?

Copy link
Contributor

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have limited context on the nexus semantics, so I'll leave that to @dandavison and @bergundy . The integration lgtm to me though.

Comment on lines -223 to +291
err.failure = failure;
if (err instanceof TemporalFailure) {
err.failure = failure;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the blind err.failure = failure caused issues before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was simply not possible before. I mean, until now, err here would always be a TemporalFailure (look at the method signature). Now, it may be a Nexus HandlerError, which is part of the (Temporal agnostic) Nexus RPC.

I'm considering storing the failure on HandlerError anyway, possibly using a private symbol property. But we are revisiting nexus error serialization at the moment, so I'm deferring that decision until we've set our mind.

};

/**
* A function that takes a Nexus Context and returns an interceptor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@mjameswh mjameswh Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that comment should have been on NexusInterceptorsFactory, above. I improved that one, as well as that of the equivalent ActivityInterceptorsFactory type, near the top of this file.

However, I'd point out that typedocs on everything interceptors-related (Client/Workflow/Activity/Nexus) is very disappointing. I'm taking note to revisit all interceptors soon.

}

/**
* A function that takes a Nexus Context and returns an interceptor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above: I don't understand this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed description on that specific typedoc. It should have one, but so should all other interceptor-related types, but most don't. I'll revisit in a distinct PR.

protected async startOperation(
ctx: nexus.StartOperationContext,
payload: Payload | undefined
): Promise<coresdk.nexus.INexusTaskCompletion> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment: I note that in Python, all the classes that have a "Handler"-like interface are in the nexus SDK:

https://github.com/temporalio/sdk-python/blob/607641bf1e8250699b9c7ae0e87230d43b26c2ff/temporalio/worker/_nexus.py#L62

whereas here you have a class with a handler-like interface in the Tempral worker. Perhaps Python could/should be written more like yours; I hadn't considered it. Do you have any comments on that difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what difference you are trying to point out.

That's an internal method of the worker to bridge between what Nexus Tasks provided by Core/the Temporal server, and the Nexus RPC's APIs. It really can't be moved to the Nexus RPC, as it is dealing with Core's NexusTaskCompletion and Temporal Payload.

It is equivalent to this method in Python.

The signature of the methods are slightly different, I think that's mostly copying the respective style of existing Workflow/Activity Task handling in various SDKs. I really don't think we need to worry about cross-SDK alignment at that level.

Copy link
Contributor

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but should get approval from someone who has more familiarity with the nexus semantics before merging.

Specifically, approval on the nexus package and the corresponding tests.

The changes integrating this with the TS SDK lgtm

Copy link
Contributor

@mjameswh mjameswh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with the PR as it stands at this point.

However, given that I myself authored a significant part of this PR, I think it's pertinent to wait for some complementary signoffs.

];
}

const { taskQueue: userSpeficiedTaskQueue, ...rest } = workflowOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in this variable name

Copy link
Contributor

@dandavison dandavison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work @mjameswh and @bergundy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants