New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Rewrite and simplify CheckpointExecutor #21234

Merged

mystenmark merged 8 commits into main from mlogan-ckpt-exec-rewrite

Feb 20, 2025

Contributor

mystenmark commented Feb 14, 2025 •

edited

Loading

Stacked on #21232, #21233 and #21235

This simplifies CheckpointExecutor massively, in preparation for optimizations which will be needed for throughput improvements

TODO

Antithesis tests - passed an 8 hour test
PTN tests - no performance or correctness problems observed
sui-single-node-benchmark tests - performance is comparable with old version after dialing concurrency down a bit

mystenmark requested review from aschran and williampsmith

February 14, 2025 21:41

vercel bot commented Feb 14, 2025 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Feb 20, 2025 8:18pm

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview		Feb 20, 2025 8:18pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview		Feb 20, 2025 8:18pm

vercel bot deployed to Preview – sui-docs

February 14, 2025 21:42

View deployment

mystenmark force-pushed the mlogan-ckpt-exec-rewrite branch from ff692cf to 038706d Compare

February 16, 2025 03:53

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 16, 2025 03:53

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 16, 2025 03:55

View deployment

bmwill reviewed

View reviewed changes

crates/sui-core/src/checkpoints/checkpoint_executor/data_ingestion_handler.rs Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Outdated Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Outdated Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Show resolved Hide resolved

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Outdated Show resolved Hide resolved


          Rewrite and simplify checkpoint executor

2085ea9

mystenmark force-pushed the mlogan-ckpt-exec-rewrite branch from 038706d to 2085ea9 Compare

February 19, 2025 18:41

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 19, 2025 18:41

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 19, 2025 18:44

View deployment

mystenmark added 2 commits

February 19, 2025 13:15


          Change StateAccumulator API to support blocking operations in Checkpo…

e5a397f

…intExecutor


          PR comments and other fixes

a0cd98b

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 19, 2025 21:15

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 19, 2025 21:17

View deployment

bmwill approved these changes

View reviewed changes

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Show resolved Hide resolved

aschran reviewed

View reviewed changes

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Outdated

+                  // Gets the next checkpoint to schedule for execution. If the epoch is already
+                  // completed, returns None.
+                  fn get_next_to_schedule(&self) -> Option<u64> {

Contributor

aschran Feb 19, 2025

what is u64?

should this be CheckpointSequenceNumber?

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs Outdated

+                  #[instrument(level = "error", skip_all, fields(epoch = ?self.epoch_store.epoch()))]
+                  pub async fn run_epoch(self, run_with_range: Option<RunWithRange>) -> StopReason {
+                      let _metrics_guard = mysten_metrics::monitored_scope("CheckpointExecutor::run_epoch");
+                      debug!(?run_with_range, "CheckpointExecutor::run_epoch");

Contributor

aschran Feb 19, 2025

could this be part of the function level instrumentation above, or why is it a separate debug!?

Contributor Author

mystenmark Feb 20, 2025

#[instrument] on its own doesn't produce any logs, it only creates a span.

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs

+                  /// forked, and return when finished.
+                  /// If `run_with_range` is set, execution will stop early.
+                  #[instrument(level = "error", skip_all, fields(epoch = ?self.epoch_store.epoch()))]
+                  pub async fn run_epoch(self, run_with_range: Option<RunWithRange>) -> StopReason {

Contributor

aschran Feb 19, 2025

not something you named this PR, but RunWithRange isn't really a range right? it's just an upper bound?

Contributor Author

mystenmark Feb 20, 2025

I suppose so! Going to leave that be for now though

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs

+                      stop_seq,
+                  };
+                  futures::stream::unfold(Some(state), |state| async move {

Contributor

aschran Feb 19, 2025

up to you but I wonder if this could be written to read more like a straightforward function without needing things wrapped in an Option<State> and so on, if you used async_stream::stream!?

Contributor Author

mystenmark Feb 19, 2025

I actually started off with async_stream and I could go back to it... but I ran into a bug and found that debugging async_stream::stream! to be quite awful. If you expand that macro it's incomprehensible. I could go back to it though, no strong opinion.

Contributor

aschran Feb 20, 2025

Yeah I could see how that would happen. Fully defer to you on which style is more readable/maintanable

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs

+                              let delta_t = now.duration_since(last_update);
+                              let delta_c = transaction_count - self.transaction_count;
+                              let tps = delta_c as f64 / delta_t.as_secs_f64();
+                              self.tps = self.tps * 0.9 + tps * 0.1;

Contributor

aschran Feb 19, 2025

what is this 0.9 and 0.1 stuff? if it's a moving average can this use simple_moving_average?

Contributor Author

mystenmark Feb 19, 2025

It's an exponential moving average https://en.wikipedia.org/wiki/Exponential_smoothing - I could do an SMA if this proves to be inaccurate. I don't even know what this metric is used for, just trying not to break any dependencies. I checked in PTN that this gives reasonable estimates.

aschran approved these changes

View reviewed changes

Contributor

aschran left a comment

preemptively approving, i only had nits


          Set concurrency

a846313

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 19, 2025 22:43

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 19, 2025 22:44

View deployment


          Restore log message expected by antithesis

e8133e6

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 20, 2025 04:56

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 20, 2025 04:57

View deployment


          Hold validator component lock for duration of reconfig so that test c…

f6b189f

…luster cannot see an intermediate state

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 20, 2025 18:36

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 20, 2025 18:37

View deployment


          PR Comment

a3d20b6

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 20, 2025 18:49

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 20, 2025 18:50

View deployment

mystenmark enabled auto-merge (squash)

February 20, 2025 19:12


          Update snapshot

9f9f086

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env

February 20, 2025 20:16

— with

GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs

February 20, 2025 20:18

View deployment

mystenmark merged commit fc0bad2 into main

47 checks passed

mystenmark deleted the mlogan-ckpt-exec-rewrite branch

February 20, 2025 20:56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet