feat: Add optional extended metrics to sort_batch function #17147

EmilyMatt · 2025-08-12T14:24:44Z

Which issue does this PR close?

Addresses Extended metrics for sorts #17146 , but more PRs can follow.

Rationale for this change

It makes accurate benchmarking much easier.
I just added a new metrics struct for the lex sort done on the sort columns, which contains the column evaluation, sort time on the indices, and how long the take kernel itself took.

Are there any user-facing changes?

the sort_batch itself function now takes another parameter, which is an Option, so yes any user using that function will have to either provide the metrics struct or just specify None, actual calls to this function within the crates are internal iiuc

alamb · 2025-08-12T17:52:05Z

FYI @ding-young and @2010YOUY01

@rluvaton could you also help review this PR?

ding-young

This is a beneficial change. Having detailed sort metrics is valuable, and I think the direction of eventually extending this to cover sort-preserving merge and multi-level merge makes a lot of sense.

One small concern I have is that, unlike aggregation where metrics are recorded for a single large batch, sorting often involves many small batches, each of which would now report timing to LexSortMetrics. Since much of the work in the current sort implementation happens in the merge phase rather than sort_batch, I’m a bit concerned that the relative cost of metric reporting within sort_batch might become non-trivial in practice.

Have you had a chance to measure or evaluate the overhead introduced by this change? If so, I’d be very interested in seeing the results.

ding-young · 2025-08-13T01:50:53Z

datafusion/physical-plan/src/aggregates/row_hash.rs

+
+    /// Metrics for sorting in the spill manager
+    lexsort_metrics: LexSortMetrics,


The current comment is fine, but perhaps we could clarify that this only applies to sort_batch, not SortPreservingMerge via StreamingMergeBuilder, since "sorting in the spill manager" could imply both.

2010YOUY01 · 2025-08-13T09:12:35Z

I think the extended metrics in this PR are too fine-grained, since we are rarely checking them, and also it's possible to measure those metrics through the flamegraph (https://datafusion.apache.org/library-user-guide/profiling.html), it might not worth to implement them as metrics.

However, for certain metrics that are not possible to obtain from flamegraphs (such as, within a single in-memory sort, the average number of batches being handled at a time; or the number of merge levels), it would be a good idea to include them in the metrics.

EmilyMatt · 2025-08-14T10:23:09Z

I think the extended metrics in this PR are too fine-grained, since we are rarely checking them, and also it's possible to measure those metrics through the flamegraph (https://datafusion.apache.org/library-user-guide/profiling.html), it might not worth to implement them as metrics.

However, for certain metrics that are not possible to obtain from flamegraphs (such as, within a single in-memory sort, the average number of batches being handled at a time; or the number of merge levels), it would be a good idea to include them in the metrics.

When used in distributed compute environments(such as when using DataFusion via Comet, which is where this arose), it can get very unwieldy to use flamegraph, and I also don't always have control over how the executable was launched.
Using metrics was the best way for me to see what was taking my time in the SortExec^
But I can close this PR if this is not a point of interest^

rluvaton · 2025-08-18T09:56:19Z

FYI @ding-young and @2010YOUY01

@rluvaton could you also help review this PR?

@alamb sure

@ding-young

This is a beneficial change. Having detailed sort metrics is valuable, and I think the direction of eventually extending this to cover sort-preserving merge and multi-level merge makes a lot of sense.

I think so too.

One small concern I have is that, unlike aggregation where metrics are recorded for a single large batch, sorting often involves many small batches, each of which would now report timing to LexSortMetrics.

Aggregation batches and sort batches should be the same size ideally, what do you mean single large batch?

Since much of the work in the current sort implementation happens in the merge phase rather than sort_batch, I’m a bit concerned that the relative cost of metric reporting within sort_batch might become non-trivial in practice.

I did not know that actually and these metrics would have shown me.

also, it is depend on many things in order for it to be the case:

number of sort columns
sort columns type
merge degree

I think the extended metrics in this PR are too fine-grained, since we are rarely checking them, and also it's possible to measure those metrics through the flamegraph (datafusion.apache.org/library-user-guide/profiling.html), it might not worth to implement them as metrics.

@2010YOUY01 metrics are made for production, flamegraph is where you actually can run it to reproduce. having this in metrics would actually benefit visibility.

the only problem I have is the naming of the metrics, take to indices and lexsort are implementation detail, maybe we can update the naming to be better understood by people who are not aware of the internals like calculating sort indices for a single batch and sorting which is the take

rluvaton

I think this is a valuable change

rluvaton · 2025-08-18T10:03:10Z

datafusion/physical-plan/src/sorts/sort.rs

+#[derive(Clone, Debug)]
+pub struct LexSortMetrics {
+    pub time_calculating_lexsort_indices: Time,
+
+    pub time_taking_indices_in_lexsort: Time,
+}
+
+impl LexSortMetrics {
+    pub fn new(metrics: &ExecutionPlanMetricsSet, partition: usize) -> Self {
+        Self {
+            time_calculating_lexsort_indices: MetricBuilder::new(metrics)
+                .subset_time("time_calculating_lexsort_indices", partition),
+            time_taking_indices_in_lexsort: MetricBuilder::new(metrics)
+                .subset_time("time_taking_indices_in_lexsort", partition),
+        }
+    }
+}


I think the naming for this metrics are implementation details, maybe rename this to help the user to understand what each metrics is for? like finding sort indices and coping the data

2010YOUY01 · 2025-08-19T08:39:00Z

I think the extended metrics in this PR are too fine-grained, since we are rarely checking them, and also it's possible to measure those metrics through the flamegraph (https://datafusion.apache.org/library-user-guide/profiling.html), it might not worth to implement them as metrics.
However, for certain metrics that are not possible to obtain from flamegraphs (such as, within a single in-memory sort, the average number of batches being handled at a time; or the number of merge levels), it would be a good idea to include them in the metrics.

When used in distributed compute environments(such as when using DataFusion via Comet, which is where this arose), it can get very unwieldy to use flamegraph, and I also don't always have control over how the executable was launched. Using metrics was the best way for me to see what was taking my time in the SortExec^ But I can close this PR if this is not a point of interest^

If it's possible to use those metrics to find a better Comet tuning, I think including them makes sense. I was imaging those metrics look like something DF internal developer would care about, that are checked to optimize SortExec implementation.
Though I don't fully get how to use them to tune applications, I'd recommend to include some comments.

Add optional extended metrics to sort_batch function

879e0e1

github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Aug 12, 2025

alamb requested a review from 2010YOUY01 August 12, 2025 17:51

ok commit hook wasn't enough

e7cbf19

ding-young reviewed Aug 13, 2025

View reviewed changes

update doc, remove evaluation metrics

ff9f759

rluvaton reviewed Aug 18, 2025

View reviewed changes

reduce syscalls count, some renaming

900f306

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add optional extended metrics to sort_batch function #17147

feat: Add optional extended metrics to sort_batch function #17147

Uh oh!

EmilyMatt commented Aug 12, 2025

Uh oh!

alamb commented Aug 12, 2025

Uh oh!

ding-young left a comment •

edited

Loading

Uh oh!

ding-young Aug 13, 2025

Uh oh!

EmilyMatt Aug 14, 2025

Uh oh!

2010YOUY01 commented Aug 13, 2025

Uh oh!

EmilyMatt commented Aug 14, 2025 •

edited

Loading

Uh oh!

rluvaton commented Aug 18, 2025

Uh oh!

rluvaton left a comment

Uh oh!

rluvaton Aug 18, 2025

Uh oh!

2010YOUY01 commented Aug 19, 2025

Uh oh!

Uh oh!


		/// Metrics for sorting in the spill manager
		lexsort_metrics: LexSortMetrics,

feat: Add optional extended metrics to sort_batch function #17147

Are you sure you want to change the base?

feat: Add optional extended metrics to sort_batch function #17147

Uh oh!

Conversation

EmilyMatt commented Aug 12, 2025

Which issue does this PR close?

Rationale for this change

Are there any user-facing changes?

Uh oh!

alamb commented Aug 12, 2025

Uh oh!

ding-young left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ding-young Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

EmilyMatt Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 commented Aug 13, 2025

Uh oh!

EmilyMatt commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rluvaton commented Aug 18, 2025

Uh oh!

rluvaton left a comment

Choose a reason for hiding this comment

Uh oh!

rluvaton Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 commented Aug 19, 2025

Uh oh!

Uh oh!

ding-young left a comment •

edited

Loading

EmilyMatt commented Aug 14, 2025 •

edited

Loading