-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite and simplify CheckpointExecutor #21234
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
ff692cf
to
038706d
Compare
crates/sui-core/src/checkpoints/checkpoint_executor/data_ingestion_handler.rs
Show resolved
Hide resolved
038706d
to
2085ea9
Compare
|
||
// Gets the next checkpoint to schedule for execution. If the epoch is already | ||
// completed, returns None. | ||
fn get_next_to_schedule(&self) -> Option<u64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is u64?
should this be CheckpointSequenceNumber
?
#[instrument(level = "error", skip_all, fields(epoch = ?self.epoch_store.epoch()))] | ||
pub async fn run_epoch(self, run_with_range: Option<RunWithRange>) -> StopReason { | ||
let _metrics_guard = mysten_metrics::monitored_scope("CheckpointExecutor::run_epoch"); | ||
debug!(?run_with_range, "CheckpointExecutor::run_epoch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be part of the function level instrumentation above, or why is it a separate debug!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[instrument]
on its own doesn't produce any logs, it only creates a span.
/// forked, and return when finished. | ||
/// If `run_with_range` is set, execution will stop early. | ||
#[instrument(level = "error", skip_all, fields(epoch = ?self.epoch_store.epoch()))] | ||
pub async fn run_epoch(self, run_with_range: Option<RunWithRange>) -> StopReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not something you named this PR, but RunWithRange
isn't really a range right? it's just an upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose so! Going to leave that be for now though
stop_seq, | ||
}; | ||
|
||
futures::stream::unfold(Some(state), |state| async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you but I wonder if this could be written to read more like a straightforward function without needing things wrapped in an Option<State>
and so on, if you used async_stream::stream!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually started off with async_stream and I could go back to it... but I ran into a bug and found that debugging async_stream::stream!
to be quite awful. If you expand that macro it's incomprehensible. I could go back to it though, no strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I could see how that would happen. Fully defer to you on which style is more readable/maintanable
let delta_t = now.duration_since(last_update); | ||
let delta_c = transaction_count - self.transaction_count; | ||
let tps = delta_c as f64 / delta_t.as_secs_f64(); | ||
self.tps = self.tps * 0.9 + tps * 0.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this 0.9 and 0.1 stuff? if it's a moving average can this use simple_moving_average
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an exponential moving average https://en.wikipedia.org/wiki/Exponential_smoothing - I could do an SMA if this proves to be inaccurate. I don't even know what this metric is used for, just trying not to break any dependencies. I checked in PTN that this gives reasonable estimates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preemptively approving, i only had nits
…luster cannot see an intermediate state
Stacked on #21232, #21233 and #21235
This simplifies CheckpointExecutor massively, in preparation for optimizations which will be needed for throughput improvements
TODO