Skip to content

Conversation

@AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Jan 5, 2026

When an append wave fails, we'll now report the last error message from each node that failed. Additionally, when a connection is lost due to receive error
(e.g. decoding error), we'll report a warning message with the status code returned from tonic. Closes #4131.

Bonus: the append wave failure includes the estimated number of bytes in the payload.

Example messages from log:

2026-01-05T14:43:21.134839Z WARN restate_bifrost::providers::replicated_loglet::sequencer::appender
  Append wave failed, retrying with a new wave after 5.394872401s. Status is [N1(ERROR(attempts=63, last_err='connection closed')), N2(ERROR(attempts=63, last_err='connection closed')), N4(ERROR(attempts=63, last_err='connection closed'))]
    wave: 63
    loglet_id: 14_5
    first_offset: 2
    to_offset: 5
    length: 4
    estimated_bytes: 40002500
    otel.name: "replicated_loglet::sequencer::appender: run"
on rs:worker-8


2026-01-05T14:44:07.852911Z WARN restate_core::network::grpc::svc_handler
  Error while receiving network message from peer, connection will be dropped
    err: status: 'Internal error', self: "h2 protocol error: error reading a body from connection"
on rs:worker-3
  in restate_core::network::io::reactor::network-reactor
    peer: N1:3
    task_id: 4076

Stack created with Sapling. Best reviewed with ReviewStack.

@AhmedSoliman AhmedSoliman marked this pull request as ready for review January 5, 2026 14:46
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Test Results

  7 files  ±0    7 suites  ±0   3m 13s ⏱️ -6s
 47 tests ±0   47 ✅ ±0  0 💤 ±0  0 ❌ ±0 
200 runs  ±0  200 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit f181056. ± Comparison against base commit d3d9e6a.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice improvements @AhmedSoliman! LGTM :-) +1 for merging.

When an append wave fails, we'll now report the last error message from each node that failed. Additionally, when a connection is lost due to receive error
(e.g. decoding error), we'll report a warning message with the status code returned from tonic. Closes #4131.

Bonus: the append wave failure includes the estimated number of bytes in the payload.

Example messages from log:
```
2026-01-05T14:43:21.134839Z WARN restate_bifrost::providers::replicated_loglet::sequencer::appender
  Append wave failed, retrying with a new wave after 5.394872401s. Status is [N1(ERROR(attempts=63, last_err='connection closed')), N2(ERROR(attempts=63, last_err='connection closed')), N4(ERROR(attempts=63, last_err='connection closed'))]
    wave: 63
    loglet_id: 14_5
    first_offset: 2
    to_offset: 5
    length: 4
    estimated_bytes: 40002500
    otel.name: "replicated_loglet::sequencer::appender: run"
on rs:worker-8


2026-01-05T14:44:07.852911Z WARN restate_core::network::grpc::svc_handler
  Error while receiving network message from peer, connection will be dropped
    err: status: 'Internal error', self: "h2 protocol error: error reading a body from connection"
on rs:worker-3
  in restate_core::network::io::reactor::network-reactor
    peer: N1:3
    task_id: 4076
```
@AhmedSoliman AhmedSoliman merged commit f181056 into main Jan 6, 2026
55 checks passed
@AhmedSoliman AhmedSoliman deleted the pr4134 branch January 6, 2026 12:15
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error reporting when append waves fail

3 participants