-
Notifications
You must be signed in to change notification settings - Fork 7
Rework execution plan hierarchy for better interoperability #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rework execution plan hierarchy for better interoperability #178
Conversation
" read_from=Stage {stage}, output_partitions={partitions}, n_tasks={n_tasks}, input_tasks={input_tasks}", | ||
)?; | ||
} | ||
pub fn display_plan_ascii(plan: &dyn ExecutionPlan) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using display_plan_ascii
, can we make this more native? We can implement Display
on PartitionIsolator
, NetworkCoalesceExec
, NetworkShuffleExec
. Even though the display output may differ from what we have now, I think this plays well with DF principles. It will also work better with datafusion-tracing since it wraps every node. Using trait methods allows you to access self
for each node without downcasting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, it would be nice if displayable
, much like execute()
worked out of the box without downcasting. I think truly "interoperating well" with datafusion actually means that we can use/implement these native methods. It gives me confidence that this would work with other people's DF extensions and custom nodes etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually get everything in your PR working without downcasting outside of the optimizer step but I'm totally lost on metrics sadly. I think we will need to contribute upstream.
The root issue that makes it hard is that I have no access to self
nor the TaskContext
while traversing a plan using the TreeNodeRewriter
. We need a more native way to inject metrics IMO - in the ExecutionPlan
trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using display_plan_ascii, can we make this more native?
You could that now, and the output is nice also, but it does not contain ascii graphics. I kept this function for backwards compatibility with previous ways of visualizing plans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root issue that makes it hard is that I have no access to self nor the TaskContext while traversing a plan using the TreeNodeRewriter. We need a more native way to inject metrics IMO - in the ExecutionPlan trait.
👍 that makes sense. Probably worth opening a discussion, maybe Ballista folks have something useful to say about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back, I think it's impossible to actually render "natively". For any child node in vanilla DF, there's never more than one parent, I think. In our case, we could have NetworkShuffleExecs
that read from the same task / stage, so we need to have our own renderer-visitor such as below to avoid rendering the same task twice. This seems right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more high level thought - maybe we should implement our own visitor which is capable of traversing tasks uniquely? Then we could display using that visitor. Not necessary in this PR but I'm just thinking about loud.
In the metrics use case, it would allow users to go find a particular node in a particular task and grab its metrics. Not sure how important that is - but that's certainly allowed in vanilla DF, so why not this project?
(Also I would still like to be able to inject metrics retroactively if possible by adding new_with_metrics
, but the MetricsWrapperExec
can get the job done in the mean time)
7d8f606
to
a8483ab
Compare
a8483ab
to
e469202
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I've been thinking about this change a lot. Overall, this change is good 👍🏽. I will give it a line-by-line review tmr morning.
It would be valuable to ask Geoffrey and/or the DF maintainers about the downcasting thing which I've commented about below. I believe I'm pro-downcasting though, it seems like the DF way.
} | ||
} | ||
|
||
impl NetworkBoundaryExt for dyn ExecutionPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this means we are able to support datafusion-tracing (even using this method). Any call to as_network_boundary(&self)
on an InstrumentationExec
which wraps a Network*
, will not actually return a NetworkBoundary
.
Even if you add this, we can never call it because we will never see a concrete InstrumentationExec
.
impl NetworkBoundaryExt for InstrumentationExec {
fn as_network_boundary(&self) -> Option<&dyn NetworkBoundary> {
self.inner.as_network_boundary()
}
}
We will only see dyn ExecutionPlan
, which will call the blanket implementation you wrote here.
I also tried this
impl<T: ExecutionPlan + ?Sized> NetworkBoundaryExt for T {
fn try_as_network_boundary(&self) -> Option<&dyn NetworkBoundary> {
None
}
}
impl NetworkBoundaryExt for InstrumentationExec {
fn as_network_boundary(&self) -> Option<&dyn NetworkBoundary> {
self.inner.as_network_boundary()
}
}
but now the compiler throws an error because these are conflicting implementations (since InstrumentationExec
implements ExecutionPlan + ?Sized
.
This all boils down to specialization being unstable: rust-lang/rust#31844. You can't have a "default blanket implementation" for a trait and override it in stable rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, datafusion-distributed
is at odds with datafusion-tracing
because as_any().downcast_ref
breaks under indirection caused by wrapper types like InstrumentedExec
.
- are wrappers like in
datafusion-tracing
an anti pattern? - is downcasting after planning an anti-pattern?
I suspect that InstrumentedExec
is actually the anti pattern.
- All other "wrappers" like
CooperativeExec
expose the children, so they would be (a) encountered during a plan traversal; and (b) downcastable. I asked claude and it seems thatInstrumentedExec
is the outlier - we could ask Geoffrey about why it's implemented this way. - The old
StageExec
implementation was problematic for the same reason - we were obfuscating the parent-child relationships. - When you want a transparent wrapper, why not just add it between two nodes and return an empty result in the
impl Display
? - As you mentioned before,
as_any
is available in theExecutionPlan
trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If InstrumentedExec
just delegated its as_any
to the underlying type, we would avoid the issue. This would be a good question for DF maintainers...
It's only used here to avoid double wrapping. I wonder why they even have to worry about that. It's supposed to be the last optimizer rule, so it shouldn't get run more than once...
urls: &[Url], | ||
codec: &dyn PhysicalExtensionCodec, | ||
) -> datafusion::common::Result<Arc<dyn ExecutionPlan>> { | ||
let prepared = Arc::clone(&self.plan).transform_up(|plan| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done during planning? Totally fine to do as a follow up. Curious why we do it here.
} | ||
|
||
/// Helper enum for storing either borrowed or owned trait object references | ||
enum Referenced<'a, T: ?Sized> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would be useful as a util. Seems a bit out of place here.
" read_from=Stage {stage}, output_partitions={partitions}, n_tasks={n_tasks}, input_tasks={input_tasks}", | ||
)?; | ||
} | ||
pub fn display_plan_ascii(plan: &dyn ExecutionPlan) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back, I think it's impossible to actually render "natively". For any child node in vanilla DF, there's never more than one parent, I think. In our case, we could have NetworkShuffleExecs
that read from the same task / stage, so we need to have our own renderer-visitor such as below to avoid rendering the same task twice. This seems right.
" read_from=Stage {stage}, output_partitions={partitions}, n_tasks={n_tasks}, input_tasks={input_tasks}", | ||
)?; | ||
} | ||
pub fn display_plan_ascii(plan: &dyn ExecutionPlan) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more high level thought - maybe we should implement our own visitor which is capable of traversing tasks uniquely? Then we could display using that visitor. Not necessary in this PR but I'm just thinking about loud.
In the metrics use case, it would allow users to go find a particular node in a particular task and grab its metrics. Not sure how important that is - but that's certainly allowed in vanilla DF, so why not this project?
(Also I would still like to be able to inject metrics retroactively if possible by adding new_with_metrics
, but the MetricsWrapperExec
can get the job done in the mean time)
Closes #177.
This PR changes how we structure the distributed plans, without changing how we execute them or how we display them (mostly).
The idea is to bring us closer to a normal DataFusion plan, and:
StageExec
plans:To produce a flattened output as any other normal DataFusion plan
This will play better with DataFusion tooling that tries to traverse the full plan, reassign children, repartition everything, etc...
In the way, it also results in less code and simpler