Skip to content

Conversation

@gabotechs
Copy link
Collaborator

@gabotechs gabotechs commented Nov 28, 2025

While investigating #228, I tried disabling metrics collection to see if that was somehow contributing to the deadlock.

The conclusion is that it not only does not affect the deadlock, but also does not affect performance in any measurable way.

As I had this code anyways that enables/disables metrics collection optionally, and I now that you @jayshrivastava is something you were planning to do, I though I might as well just create a separate PR for this.


The collect_metrics field in DistributedConfig now needs to be accessible not only on the head node, but also in any other node involved in the query, which means that we now need to send DistributedConfig over the wire as an option extension. While implementing that, I stepped with #247, which prevents us from sending a mutated config extension over the wire.

I made a workaround in this PR for addressing the issue manually with a manually_propagate_distributed_config(), but once the issue gets fixed, manual propagation is no longer needed.

@gabotechs gabotechs changed the title Gabrielmusat/collect metrics optionally Collect metrics optionally Nov 28, 2025
@gabotechs gabotechs force-pushed the gabrielmusat/collect-metrics-optionally branch from f653b10 to 5fc4b41 Compare November 28, 2025 14:52
pub shuffle_batch_size: usize, default = 8192
/// Propagate collected metrics from all nodes in the plan across network boundaries
/// so that they can be reconstructed on the head node of the plan.
pub collect_metrics: bool, default = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: defaulting to true might be better since that's how vanilla df works.

let target_task = partition / partitions_per_task;
let target_partition = partition % partitions_per_task;

// TODO: this propagation should be automatic <link to issue>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing a link. Do you want to create an issue in this repo as well?

let stream = if retrieve_metrics {
MetricsCollectingStream::new(stream, metrics_collection_capture).left_stream()
} else {
stream.right_stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind explaining this / leaving a comment? I pulled your branch and I honestly cannot tell where the Either type comes from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants