Replies: 4 comments
-
This is great, I really like this approach very much, @ishant162! The proposed API is more intuitive than the current one IMO - similar to executing a program (the I'm wondering though if there isn't a subtle aspect/intention of the original design that we could be missing? I remember @psfoley had some concerns with this approach, so hopefully he can provide additional context and insight. |
Beta Was this translation helpful? Give feedback.
-
This is definitely a welcome change, I made a diagram in drawio of the current implementation of workflow API a while back, I'll post it here in case it can be of any help. https://drive.google.com/file/d/171yRMKWJceQIhHI4AOXdda5RmJ5n3igX/view?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
I don't disagree with the technical merits of the proposal - and I think this does lead to cleaner separate of runtime and workload (from a class hierarchy perspective). My original reservation was centered around making sure we can cleanly express the chaining of workflows together across different environments. There's all kinds of patterns emerging in federated learning that involve dependencies between distinct workflows run on different infrastructure:
But after thinking through this further and creating the following example that compares the two implementations, I think there's good reason to move forward with this refactor: model_pretraining_flow = ModelPretrainingFlow(...)
preprocessing_flow.runtime = LocalRuntime(...)
federated_finetuning = FederatedFinetuningFlow(...)
federated_finefuning.runtime = FederatedRuntime(...)
# Admitedly kind of ugly
federated_finetuning(model_pretraining.run()).run() versus the proposal: model_pretraining = ModelPretrainingFlow(...)
local_runtime = LocalRuntime(...)
federated_finetuning = FederatedFinetuningFlow(...)
federated_runtime = FederatedRuntime([small set of collaborators])
federated_evaluation = FederatedEvaluation(...)
secure_federated_runtime = SecureFederatedRuntime([collaborators across large hospital network])
local_runtime.run(model_pretraining)
federated_runtime.run(federated_finetuning(model_pretraining.model))
secure_federated_runtime.run(federated_evaluation(federated_finetuning.model)) |
Beta Was this translation helpful? Give feedback.
-
In our review meeting, a detailed walkthrough of the proposal was provided, highlighting its key aspects, benefits, and shortcomings. The team extensively discussed its value moving forward, focusing on key improvements such as:
Everyone shared their feedback, leading to an in-depth discussion. Final Decision: |
Beta Was this translation helpful? Give feedback.
-
SUMMARY
One of the main goals of Workflow API is to clearly separate Workflow definition and Runtime infrastructure. While the current implementation is effective, there is some coupling between
FLSpec
andRuntime
and scope for further improvement.This proposal refines the interaction between
FLSpec
andRuntime
classes, such asLocalRuntime
andFederatedRuntime
to make the design cleaner and more modular.MOTIVATION
(Current) High Level Design:
In the example shown above, both
LocalRuntime
andFederatedRuntime
are directly assigned toFLSpec
Instance illustrating how the runtime is integrated (and coupled) with the flow.From the above illustration we can see that runtime is an attribute of
FLSpec
. Whenflflow.run()
is called, the execution flows to the associated runtime, whether it isLocalRuntime
orFederatedRuntime
.Potential issues with Current Approach:
Any future changes or enhancements to
Runtime
may need changes toFLSpec
flflow.run()
is a blocking call, preventing users from querying the experiment status or performing other actions while the experiment is running. While it is possible to modify this behavior by passing an argument to indicate whether sync or async execution is desired, changes are required toFLSpec
which should ideally not be the case.Runtime
related features whereFLSpec
might need to be modified:FLSpec
to user to query the status, retrieve the results and clean experiment data.Director
support also).PROPOSED APPROACH
(Proposed) High Level Design: To decouple
FLSpec
fromRuntime
, we propose:run()
method within the runtime.FLSpec
, the runtime’s run method will accept theFLSpec
instance as an argument.This approach shall achieve a clean separation of Workflow and Runtimes, aligning with the principles of a well-defined Workflow API. With this approach it should be possible to enhance
Runtime
without impactingFLSpec
.TECHNICAL DETAILS
run()
method will be added to LocalRuntime and FederatedRuntime to centralize execution logic within the runtime infrastructure.run()
method will be removed from FLSpec to streamline its responsibilities and improve the separation between Workflow definition and Workflow execution.run_local()
andrun_federated()
inFLSpec
: These methods will also be removed, with their functionality now managed by the respective runtime classes.flflow.run()
will need to be updated to use the newrun()
method in the appropriate runtime classes.KEY BENEFITS
FLSpec
fromRuntime
for improved flexibility.Runtime
to evolve independently ofFLSpec
.RISKS
MITIGATION
NEXT STEPS
Beta Was this translation helpful? Give feedback.
All reactions