Skip to content

Conversation

@muhamadazmy
Copy link
Contributor

@muhamadazmy muhamadazmy commented Nov 17, 2025

Use ingestion-client in the Shuffler

Avoid direct writes to bifrost in shuffler by using a
dedicated ingestion-client instance.


Stack created with Sapling. Best reviewed with ReviewStack.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for replacing the direct Bifrost write with the IngressClient in the Shuffle @muhamadazmy. Maybe the name IngressClient does not fit 100% given that now also the Shuffle uses it. Maybe something like IngestionClient or so works better. Given that we don't use the send window of the IngressClient yet, I wouldn't expect a different runtime behavior of the shuffle. Once we have this, I would be interested in how the overall shuffle throughput increases by using the IngressClient.

I left a few minor comments for your consideration.

@tillrohrmann
Copy link
Contributor

There seem to be a few test failures on GHA.

Comment on lines 228 to 233
ingress
.ingest(
msg.partition_key(),
IngestRecord::from_parts(msg.record_keys(), msg),
)
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about support for rolling upgrades?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingest does not fail unless the ingession client is closed. This means worst case is that it will block until leaders are responsive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to avoid this situation it's possible we release support for the Ingest messages in PP first before actually using them in the following release.

@muhamadazmy muhamadazmy force-pushed the pr4024 branch 15 times, most recently from d6e9955 to 8c0797e Compare December 2, 2025 09:00
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for updating the shuffle to use the new ingestion client @muhamadazmy. LGTM. +1 for merging.

Did you ran any benchmarks/tests for a workload that needs to shuffle a lot to check whether we see an improvement with these changes?


let partition_store_manager = PartitionStoreManager::create().await?;

let ingress = IngestionClient::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingestion_client?

///
/// Settings for the shared ingestion client used by all workers to
/// manage record ingestion across partitions (shuffle).
pub shuffle: IngestionOptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IngestionOptions options refer to the Kafka ingestion. Should the description be updated to also include the shuffle or can it be generic enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to update the docs of IngestionOptions to make it more generic.

Comment on lines 294 to 299
let mut stream =
state_machine::StateMachine::new(metadata, ingestion_client, outbox_reader, hint_rx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is stream still the best variable name?

loop {
match &mut self.state {
State::Idle => {
loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this inner loop needed or could it be removed and let the outer loop handle things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's totally not needed.

@muhamadazmy muhamadazmy force-pushed the pr4024 branch 2 times, most recently from 58a3264 to 68b90af Compare December 29, 2025 13:02
@muhamadazmy muhamadazmy force-pushed the pr4024 branch 4 times, most recently from 3c9a9a4 to b5a503b Compare January 6, 2026 09:06
- Use IngestionClient instead of bifrost to write to partitions logs
- Remove deprecated `delete_invocation`
Avoid direct writes to bifrost in shuffler by using a
dedicated ingestion-client instance.
@muhamadazmy muhamadazmy merged commit d3d9e6a into restatedev:main Jan 6, 2026
58 checks passed
@muhamadazmy muhamadazmy deleted the pr4024 branch January 6, 2026 10:11
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants