-
Notifications
You must be signed in to change notification settings - Fork 25
Fix metrics display on leaf nodes #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a2311e6 to
a33ebd5
Compare
| inner: self.inner.clone(), | ||
| inner: Arc::clone(&self.inner).with_new_children(children.clone())?, | ||
| metrics: self.metrics.clone(), | ||
| children: Some(children), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I found was that, when this method was called, the self.children where being updated, but not the actual children from self.inner.
This is fine as long as you access MetricsWrapperExec with the children() method, as then, the ones just inject will be returned, but it's not fine if you access the children through input_stage(), which is what was happening in display_plan_ascii.
I think it should be fine with just removing this self.children in favor of just applying the new children to self.inner directly, but I may be missing something.
WDYT @jayshrivastava?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds okay.
I forget why with_new_children is even called on MetricsWrapperExec. It means we're transforming a node after its wrapped...
I made this change intentionally though - we used to error in this method and I remember updating it so we don't error. Might be in the git blame.
| assert_not_contains!(display, "metrics=[]"); | ||
|
|
||
| let display = display_plan_ascii(d_physical.as_ref(), true); | ||
| assert_not_contains!(display, "metrics=[]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use the node_metrics helper. You can also assert the exact output rows since that's very deterministic (or at least, assert they are equal).
|
|
||
| fn metrics(&self) -> Option<MetricsSet> { | ||
| Some(self.worker_connections.metrics.clone_inner()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the other network boundary nodes too? Like shuffle, coalese.
| self.input | ||
| .execute(*actual_partition_number, context)? | ||
| .inspect_ok(move |v| metric.add(v.num_rows())), | ||
| ))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some assertions in #314 which are made weaker to skip over PartitionIsolatorExec nodes. Do you think you can revisit those assertions?
| Ok((s_physical, d_physical)) | ||
| } | ||
|
|
||
| fn node_metrics<T: ExecutionPlan + 'static>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we missed this bug because we only call this for DataSourceExec. Want to update the tests in this file to call this for Network*Exec?
I think I messed something up in #306 and I see some leaf stages are missing their metrics.
This PR solves it and adds a new test that fails on main but succeeds in this PR.