-
Notifications
You must be signed in to change notification settings - Fork 18
Collect metrics optionally #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f653b10 to
5fc4b41
Compare
| pub shuffle_batch_size: usize, default = 8192 | ||
| /// Propagate collected metrics from all nodes in the plan across network boundaries | ||
| /// so that they can be reconstructed on the head node of the plan. | ||
| pub collect_metrics: bool, default = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: defaulting to true might be better since that's how vanilla df works.
| let target_task = partition / partitions_per_task; | ||
| let target_partition = partition % partitions_per_task; | ||
|
|
||
| // TODO: this propagation should be automatic <link to issue> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing a link. Do you want to create an issue in this repo as well?
| let stream = if retrieve_metrics { | ||
| MetricsCollectingStream::new(stream, metrics_collection_capture).left_stream() | ||
| } else { | ||
| stream.right_stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind explaining this / leaving a comment? I pulled your branch and I honestly cannot tell where the Either type comes from.
While investigating #228, I tried disabling metrics collection to see if that was somehow contributing to the deadlock.
The conclusion is that it not only does not affect the deadlock, but also does not affect performance in any measurable way.
As I had this code anyways that enables/disables metrics collection optionally, and I now that you @jayshrivastava is something you were planning to do, I though I might as well just create a separate PR for this.
The
collect_metricsfield inDistributedConfignow needs to be accessible not only on the head node, but also in any other node involved in the query, which means that we now need to sendDistributedConfigover the wire as an option extension. While implementing that, I stepped with #247, which prevents us from sending a mutated config extension over the wire.I made a workaround in this PR for addressing the issue manually with a
manually_propagate_distributed_config(), but once the issue gets fixed, manual propagation is no longer needed.