More of a discussion trigger than a true issue but I think it is best to write things down for easier discussion.
By workflow inputs, I mean the inputs that the the workflow need to run without problem. Inputs provided by upstream tasks do not count.
As far as I know, there are three ways of defining workflow inputs:
- Dynamically generate them from the inputs not supplied by upstream tasks (see
ewoks show)
- Define "input nodes" (used for subgraphs)
- Specific fields used for the data portal reprocessing
I would propose to unifying those by adding a new field in the workflow spec/schema (e.g. workflow_inputs).
This field would allow:
- To store inputs statically without graph analysis. The field contents could be checked for correctness (
ewoks lint) or dynamically generated (like we do for ewoks install that adds a requirements field)
- To remove the need for input nodes. Instead, each workflow would define its inputs when used as a subgraph thanks to this new field (the same way tasks define their inputs).
- To add metadata (e.g.
type of the input) that could be used by reprocessing software to present the user with a meaningful form. Data portal is obviously a target but my big plan is to add similar form generation in ewoksreprocess to generate dynamically GUIs for reprocessing from this metadata.
If all tasks use pydantic models, the dynamic generation of workflow inputs can add types directly. Unfortunately, this requires to have all the projects installed since importing the tasks and their pydantic input models will be needed.
The representation of this new field is also up for discussion. Obviously, each input will be a dict but we could imagine having a serialized version of a pydantic model to define types and default values for all inputs or use an existing standard such as JSON schema. To be checked for feasibility.
Re-reading this, I probably did a very bad job at explaining my thought process but at least, I got it out of my system.
More of a discussion trigger than a true issue but I think it is best to write things down for easier discussion.
By workflow inputs, I mean the inputs that the the workflow need to run without problem. Inputs provided by upstream tasks do not count.
As far as I know, there are three ways of defining workflow inputs:
ewoks show)I would propose to unifying those by adding a new field in the workflow spec/schema (e.g.
workflow_inputs).This field would allow:
ewoks lint) or dynamically generated (like we do forewoks installthat adds arequirementsfield)typeof the input) that could be used by reprocessing software to present the user with a meaningful form. Data portal is obviously a target but my big plan is to add similar form generation inewoksreprocessto generate dynamically GUIs for reprocessing from this metadata.If all tasks use pydantic models, the dynamic generation of workflow inputs can add
typesdirectly. Unfortunately, this requires to have all the projects installed since importing the tasks and their pydantic input models will be needed.The representation of this new field is also up for discussion. Obviously, each input will be a dict but we could imagine having a serialized version of a pydantic model to define types and default values for all inputs or use an existing standard such as JSON schema. To be checked for feasibility.
Re-reading this, I probably did a very bad job at explaining my thought process but at least, I got it out of my system.