Skip to content

Conversation

@AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Jan 6, 2026

This change consolidates the max message size configuration for all gRPC
servers and clients to use the configurable and cascading value for message-size-limit from
NetworkingOptions via the GrpcConnectionOptions trait. The value can be overridden for metadata client
and invoker independently if needed. Additionally, the internal max message size is a little more than
the configured value to allow for the overhead of our messaging and other fields outside of what the
user sets. For instance, the user can think of the message-size-limit as the maximum value they can
set in their virtual object state or the maximum size of a ctx.run() block without considering the
overhead of the gRPC serialization and the additional metadata that is added by us.

This change is part of #4130

Key changes:

  • Introduced DEFAULT_MESSAGE_SIZE_LIMIT constant (32 MiB) in
    restate_types::config as a single source of truth
  • Updated all gRPC servers and clients to use config.message_size_limit() instead of hardcoded constants
  • Updated CLI tools to use the same default via the shared constant
  • Improved (and exposed) documentation for message-size-limit configuration
  • Invoker now clamps the message-size-limit to the networking.message-size-limit value and uses it as default value if unset

Configuration changes:

  • networking.max-message-size renamed to networking.message-size-limit
  • networking.message-size-limit default changed from 10 MiB to 32 MiB
  • common.metadata-client.max-message-size renamed to common.metadata-client.message-size-limit
  • common.metadata-client.message-size-limit now defaults to networking.message-size-limit if unset,
    and is clamped to the networking limit if set higher
  • worker.invoker.message-size-limit now defaults to networking.message-size-limit if unset,
    and is clamped to the networking limit if set higher

Stack created with Sapling. Best reviewed with ReviewStack.

This change is part of #4130

1. Errors in the send path caused by encoding limits cannot be surfaced in logs or handled properly - connections are silently dropped. We
   leave the encoding limit as unlimited (the default) and will add explicit checks in key parts of the send path in a subsequent PR.
2. Setting a high decode limit does not add overhead since tonic does not preallocate this value per decoder. It only uses it as a check to limit uncompressed buffer sizes. Decode failures due to limits are correctly surfaced in logs.

Therefore, we are opting to have an unlimited encoding limit on the grpc layer since failures are obsecure and we'll
selectively add explicit checks in the important parts of the send path in a subsequent PR.

The decode size limit is still set so practically, the impact of this change is improved logging when the limit is hit.
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

Test Results

  7 files  +  2    7 suites  +2   3m 15s ⏱️ + 2m 12s
 47 tests + 29   47 ✅ + 29  0 💤 ±0  0 ❌ ±0 
200 runs  +164  200 ✅ +164  0 💤 ±0  0 ❌ ±0 

Results for commit 81a5288. ± Comparison against base commit f181056.

This pull request removes 18 and adds 47 tests. Note that renamed tests count towards both.
dev.restate.sdktesting.tests.AwakeableIngressEndpointTest ‑ completeWithFailure(Client)
dev.restate.sdktesting.tests.AwakeableIngressEndpointTest ‑ completeWithSuccess(Client)
dev.restate.sdktesting.tests.IngressTest ‑ idempotentInvokeSend(Client)
dev.restate.sdktesting.tests.IngressTest ‑ idempotentInvokeService(Client)
dev.restate.sdktesting.tests.IngressTest ‑ idempotentInvokeVirtualObject(Client)
dev.restate.sdktesting.tests.IngressTest ‑ idempotentSendThenAttachWIthIdempotencyKey(Client)
dev.restate.sdktesting.tests.IngressTest ‑ privateService(URI, Client)
dev.restate.sdktesting.tests.JournalRetentionTest ‑ journalShouldBeRetained(Client, URI)
dev.restate.sdktesting.tests.KafkaAndWorkflowAPITest ‑ callSharedWorkflowHandler(URI, int, Client)
dev.restate.sdktesting.tests.KafkaAndWorkflowAPITest ‑ callWorkflowHandler(URI, int, Client)
…
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[1]
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[2]
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[3]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[1]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[2]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[3]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[1]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[2]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[3]
dev.restate.sdktesting.tests.Combinators ‑ awakeableOrTimeoutUsingAwakeableTimeoutCommand(Client)
…

♻️ This comment has been updated with latest results.

@AhmedSoliman AhmedSoliman force-pushed the pr4137 branch 2 times, most recently from 24aa000 to 8c24709 Compare January 6, 2026 15:13
This change consolidates the max message size configuration for all gRPC
servers and clients to use the configurable and cascading value for `message-size-limit` from
NetworkingOptions via the GrpcConnectionOptions trait. The value can be overridden for metadata client
and invoker independently if needed. Additionally, the internal max message size is a little more than
the configured value to allow for the overhead of our messaging and other fields outside of what the
user sets. For instance, the user can think of the message-size-limit as the maximum value they can
set in their virtual object state or the maximum size of a ctx.run() block without considering the
overhead of the gRPC serialization and the additional metadata that is added by us.

This change is part of #4130

Key changes:
- Introduced DEFAULT_MESSAGE_SIZE_LIMIT constant (32 MiB) in
  `restate_types::config` as a single source of truth
- Updated all gRPC servers and clients to use config.message_size_limit() instead of hardcoded constants
- Updated CLI tools to use the same default via the shared constant
- Improved (and exposed) documentation for message-size-limit configuration
- Invoker now clamps the message-size-limit to the networking.message-size-limit value and uses it as default value if unset

Configuration changes:
- `networking.max-message-size` renamed to `networking.message-size-limit`
- `networking.message-size-limit` default changed from 10 MiB to 32 MiB
- `common.metadata-client.max-message-size` renamed to `common.metadata-client.message-size-limit`
- `common.metadata-client.message-size-limit` now defaults to `networking.message-size-limit` if unset,
  and is clamped to the networking limit if set higher
- `worker.invoker.message-size-limit` now defaults to `networking.message-size-limit` if unset,
  and is clamped to the networking limit if set higher
Copy link
Contributor

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you so much for taking care of this issue :) +1 for merging

Comment on lines +811 to +812
/// the value of `networking.message-size-limit` since larger messages cannot be transmitted
/// over the cluster internal network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏼

@AhmedSoliman AhmedSoliman merged commit 81a5288 into main Jan 7, 2026
28 checks passed
@AhmedSoliman AhmedSoliman deleted the pr4137 branch January 7, 2026 11:11
@github-actions github-actions bot locked and limited conversation to collaborators Jan 7, 2026
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: Why did you stop setting the max encoding message size? Is it because it might generate a few false positives if the server is configured with a larger decoding message size?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants