Skip to content

Advanced Request Behavior Option#1582

Merged
scottf merged 3 commits into
mainfrom
advanced-request-behavior
Jun 12, 2026
Merged

Advanced Request Behavior Option#1582
scottf merged 3 commits into
mainfrom
advanced-request-behavior

Conversation

@scottf

@scottf scottf commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Address #1581

This adds an opt-in connection option, advancedRequestBehavior, that surfaces why a request came back without a response instead of the generic timeout. The detail is a classified RequestFailureReason — one of TIMEOUT, CONNECTION_CLOSING, NO_RESPONDERS, PROTOCOL_ERROR, or CANCELLED — plus the connection status, the last protocol error, how long the request waited, and the original exception chained as the cause, so you can tell a real timeout apart from a reconnect, a permissions error, or no stream on the subject.

It applies to both request styles:

  • JetStream requests (publish, stream/consumer management) throw a RequestFailureException in place of the generic IOException("Timeout or no response waiting for NATS JetStream server").
  • Core Connection.request(...) returns a RequestFailureMessage (which is a Message) instead of null, which you can inspect for the same detail.

The option is off by default; existing behavior is unchanged unless you enable in the connection options.

try {
    js.publish("subject", data);
}
catch (RequestFailureException e) {
    // e.getReason(), e.getConnectionStatus(), e.getLastError(), e.getCause()
    log.warn("publish failed: {} (status={})", e.getReason(), e.getConnectionStatus());
}

For a core request, the message will never be null. It will either be the response message or will be a RequestFailureMessage. You must inspect the returned message:

Message m = nc.request("subject", data, Duration.ofSeconds(2));
if (m instanceof RequestFailureMessage) {
    RequestFailureMessage rfm = (RequestFailureMessage)m;
    log.warn("request failed: {} (status={})", rfm.getReason(), rfm.getConnectionStatus());
}

The two paths are distinct: a JetStream request throws RequestFailureException (which extends IOException, so existing catch (IOException) blocks keep working), while a core request(...) returns the RequestFailureMessage — it does not throw.

@scottf

scottf commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@claude review

@synadia-claude-reviewer

synadia-claude-reviewer Bot commented Jun 11, 2026

Copy link
Copy Markdown

Claude finished @scottf's task in 5m 19s —— View job


Code Review: Advanced Request Behavior Option

Tasks

  • Load tools
  • Read changed files
  • Analyze implementation
  • Post review feedback

The feature is well-conceived and the opt-in design is correct — off by default, backward-compatible, and the JetStream/core split (throw vs. return) is a reasonable design choice that keeps existing catch (IOException) blocks working.

That said, I found several issues ranging from a real bug to API design concerns.


Bug: ExecutionException is not unwrapped as documented

src/main/java/io/nats/client/impl/NatsConnection.java (~line 1329)

catch (ExecutionException e) {
    // unwrap: carry the underlying cause, not the ExecutionException wrapper...
    cause = e;   // ← stores the ExecutionException, not e.getCause()
}
return new RequestFailureMessage(incoming, cause, timeout, getStatus(), getLastError());

The comment says "carry the underlying cause, not the ExecutionException wrapper" but cause = e stores e itself. The RequestFailureMessage.getCause() Javadoc explicitly states: "for a failure delivered as an ExecutionException it is that exception's cause (the wrapper is unwrapped)" — but the implementation doesn't unwrap. RequestFailureException.getCause() (and the chained cause in the exception message) will expose the ExecutionException instead of the real underlying CancellationException.

Fix:

catch (ExecutionException e) {
    cause = e.getCause() != null ? e.getCause() : e;
}

Note: the classification logic in RequestFailureMessage is unaffected by this bug because it uses future.wasCancelledClosing() / future.wasCancelledTimedOut() boolean flags, not cause instanceof. But the cause exposed to the caller is wrong. Fix this →


Design: waited reports the configured timeout, not elapsed time

src/main/java/io/nats/client/impl/NatsConnection.java (~line 1334)

return new RequestFailureMessage(incoming, cause, timeout, getStatus(), getLastError());
//                                                  ↑ this is the configured Duration, not elapsed time

The PR description and RequestFailureException.getWaited() Javadoc both say "how long the request waited." But timeout is the configured timeout — for a CONNECTION_CLOSING that fires after 500 ms with a 10 s timeout, getWaited() returns 10 s, not 500 ms. The testConnectionClosingReasonWhenFlagOn test avoids asserting a specific value on waited, which hints at this inaccuracy.

Capturing actual elapsed time would require a long startNano = NatsSystemClock.nanoTime() before incoming.get(...), then computing Duration.ofNanos(NatsSystemClock.nanoTime() - startNano) before constructing the carrier. Fix this →


Design: RequestFailureMessage leaks from the impl package into the public API

The README instructs end users to write:

if (m instanceof RequestFailureMessage) {
    RequestFailureMessage rfm = (RequestFailureMessage) m;

But RequestFailureMessage lives in io.nats.client.impl, which is the internal implementation package. RequestFailureException and RequestFailureReason are correctly in io.nats.client; RequestFailureMessage should be too. The same dependency crosses a package boundary in Discovery.java (service layer importing from impl).

Moving RequestFailureMessage to io.nats.client would complete the public API surface consistently. Since the constructor is package-private this is a safe refactor.


Design: CANCELLED reason is never returned by classify()

src/main/java/io/nats/client/RequestFailureReason.java defines CANCELLED; src/main/java/io/nats/client/impl/RequestFailureMessage.java classifies to TIMEOUT as the final fallback. The test comments acknowledge CANCELLED is unreachable. If the intent is a "catch-all for unclassified cancellations," the fallback in classify() should return CANCELLED, not TIMEOUT — otherwise a future unclassified case silently masquerades as a real timeout, which is harder to debug than an honest CANCELLED. If CANCELLED is truly unused, removing it avoids confusion.


Design: stale lastError can mis-classify failures as PROTOCOL_ERROR

In classify():

if (lastError != null && !lastError.isEmpty()) {
    return RequestFailureReason.PROTOCOL_ERROR;
}

getLastError() returns the connection-level last error, which persists until clearLastError() is called. If a previous request had a server -ERR and the connection's error was never cleared, a subsequent timeout or no-responders failure would be mis-classified as PROTOCOL_ERROR. The classification is only correct if lastError is scoped to the current request's lifetime.


Docs: Connection interface Javadoc not updated

src/main/java/io/nats/client/Connection.java — all three request(...) overloads still say:

@return the reply message or null if the timeout is reached

With advancedRequestBehavior enabled the method never returns null on failure — it returns a RequestFailureMessage. The @Nullable annotation and the return-value description are both wrong in that mode. The Javadoc (and possibly the annotation) should be updated to describe both behaviors.


Minor: missing serialVersionUID in RequestFailureException

RequestFailureException extends IOException which is Serializable. Without an explicit serialVersionUID, serialization across different builds is fragile and most static-analysis tools will flag this.


Positive notes

  • The opt-in design is correct: off by default, zero impact on existing code paths.
  • The classify() logic is sound for the cases it handles — the ordering (CONNECTION_CLOSING first, then PROTOCOL_ERROR, then NO_RESPONDERS, then TIMEOUT) is correct.
  • All callers of requestInternal/conn.request() that could receive a RequestFailureMessage have been updated: responseRequired(), NatsJetStreamMessage, and Discovery. No missed call sites found.
  • The unrelated connectionTimeout null-guard changes in Options.java are reasonable defensive additions.
  • The test file is well-structured with clear comments explaining which scenarios are intentionally not covered (PROTOCOL_ERROR, CANCELLED) and why.
  • The ExecutionException vs CancellationException wrapping mechanics (via cancel(true) vs completeExceptionally()) are correctly leveraged for the NO_RESPONDERS classification.

@scottf

scottf commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@claude review again

@synadia-claude-reviewer

synadia-claude-reviewer Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @scottf's task in 2m 55s —— View job


Code Review (Re-review)

Tasks

  • Load tools
  • Read changed files
  • Compare against previous review findings
  • Analyze new/remaining issues
  • Post review feedback

All seven issues from the previous review have been addressed. The implementation is in good shape. I found only minor remaining concerns.

Previous issues — all resolved

# Issue Status
1 ExecutionException not unwrapped ✅ Fixed: cause = e.getCause()
2 waited reported configured timeout not elapsed time ✅ Resolved by removing the field entirely; README updated accordingly
3 RequestFailureMessage in impl package ✅ Moved to io.nats.client
4 CANCELLED reason unreachable ✅ Now returned for unclassified CancellationExceptions
5 Stale lastError could mis-classify failures ✅ Fixed with before/after capture + emptyAsNull normalisation
6 Connection.java Javadoc not updated ✅ All three request(...) overloads now describe both return modes
7 Missing serialVersionUID ✅ Added private static final long serialVersionUID = 1L

The rename from PROTOCOL_ERRORSERVER_ERROR is a good improvement in naming clarity.


Remaining minor concerns

1. cause = e.getCause() has no null guard (NatsConnection.java ~line 1335)

catch (ExecutionException e) {
    cause = e.getCause();   // getCause() can theoretically return null
}

The RequestFailureMessage Javadoc says getCause() "may be null", so passing null is documented as acceptable. The previous review suggestion was e.getCause() != null ? e.getCause() : e to preserve the wrapper when there is no cause. In practice, an ExecutionException with a null cause is essentially impossible in this code path — cancelClosing() and cancelTimedOut() always pass a non-null throwable to completeExceptionally(). This is a very minor nit; the code is safe as-is.

2. RequestFailureMessage's public constructor exposes an internal type (src/main/java/io/nats/client/RequestFailureMessage.java)

public RequestFailureMessage(NatsRequestCompletableFuture future, ...)

NatsRequestCompletableFuture lives in io.nats.client.support (an internal package). Since RequestFailureMessage is now a public API class in io.nats.client, the constructor signature leaks an internal type. Users won't call this constructor themselves, but it appears in generated Javadoc and IDE completions. A narrow fix would be to restrict the visibility or to pass only what classify() needs (the wasCancelledClosing() / wasCancelledTimedOut() booleans and the CancelAction) rather than the whole future. This is a design concern rather than a bug — the current code is functional.

3. Wildcard import in Discovery.java (src/main/java/io/nats/service/Discovery.java)

The change replaced four explicit imports with import io.nats.client.*. Wildcard imports are generally discouraged in Java projects that use static analysis or strict import ordering — it's a style/hygiene nit, nothing functional.

4. PR description code example is now stale

The PR body still shows e.getWaited() in the JetStream catch example. That method no longer exists (the waited field was removed). The README was correctly updated, but the PR description itself was not. This has no effect on the code.


Positive notes

  • The before/after lastError comparison is elegant and correctly handles the emptyAsNull normalisation — the stale-error problem is fully solved.
  • Removing waited rather than trying to fix it is the right call; it eliminates a misleading field and simplifies the API surface.
  • CANCELLED is now reachable for genuine unclassified cancellations without adding test coverage that would be inherently flaky — the comment explaining why it is not tested is clear and acceptable.
  • classify() ordering and logic is sound; the wasCancelledClosing flag check precedes the CancellationException instanceof check, so the closing path and the NO_RESPONDERS path are correctly distinguished.
  • All previously identified call sites that receive RequestFailureMessage (responseRequired(), NatsJetStreamMessage.ackSync(), Discovery) have been updated.

@scottf

scottf commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@claude one last review. Here are comments on your last items

1. cause = e.getCause() has no null guard (NatsConnection.java ~line 1335)
NON-ISSUE. In this exact case, cause will never be null, because an ExecutionException by definition is a wrapper around a cause.

2. RequestFailureMessage's public constructor exposes an internal type
NO. NatsRequestCompletableFuture is public. It's packaging means nothing.

3. Wildcard import in Discovery.java (src/main/java/io/nats/service/Discovery.java)
No, Claude, your style guide is wrong.

4. PR description code example is now stale
Fixed

@synadia-claude-reviewer

synadia-claude-reviewer Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @scottf's task in 52s —— View job


Final Review

Tasks

  • Load tools
  • Verify responses to previous concerns
  • Post final assessment

All four responses are accepted. Here is my assessment of each:

1. cause = e.getCause() null guard — NON-ISSUE ✅

Correct. ExecutionException is always constructed with a non-null cause (it wraps an exception thrown by a task). In this code path, completeExceptionally() is always called with a real Throwable, so getCause() is never null here. My concern was theoretical rather than real.

2. NatsRequestCompletableFuture in public constructor — NO ISSUE ✅

Correct. The class is public class NatsRequestCompletableFuture — the Javadoc comment "This is an internal class and is only public for access" describes the intent, but the class is fully accessible. The constructor signature exposes no actually-private type.

3. Wildcard import in Discovery.java — PROJECT PREFERENCE ✅

Accepted. Wildcard imports are a style choice. The previous explicit imports were fine, so is the wildcard — this is the project's call to make.

4. Stale PR description code example — FIXED ✅

Confirmed.


Summary

All issues from the two prior reviews are resolved. The implementation is correct, the API surface is clean, and the opt-in design is backward-compatible. This PR is ready to merge.

@Jarema Jarema left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@scottf scottf merged commit 6b89328 into main Jun 12, 2026
5 checks passed
@scottf scottf deleted the advanced-request-behavior branch June 12, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants