Skip to content

Unify how to specify workflow inputs with a new field workflow_inputs #437

Description

@loichuder

More of a discussion trigger than a true issue but I think it is best to write things down for easier discussion.

By workflow inputs, I mean the inputs that the the workflow need to run without problem. Inputs provided by upstream tasks do not count.

As far as I know, there are three ways of defining workflow inputs:

  1. Dynamically generate them from the inputs not supplied by upstream tasks (see ewoks show)
  2. Define "input nodes" (used for subgraphs)
  3. Specific fields used for the data portal reprocessing

I would propose to unifying those by adding a new field in the workflow spec/schema (e.g. workflow_inputs).

This field would allow:

  1. To store inputs statically without graph analysis. The field contents could be checked for correctness (ewoks lint) or dynamically generated (like we do for ewoks install that adds a requirements field)
  2. To remove the need for input nodes. Instead, each workflow would define its inputs when used as a subgraph thanks to this new field (the same way tasks define their inputs).
  3. To add metadata (e.g. type of the input) that could be used by reprocessing software to present the user with a meaningful form. Data portal is obviously a target but my big plan is to add similar form generation in ewoksreprocess to generate dynamically GUIs for reprocessing from this metadata.

If all tasks use pydantic models, the dynamic generation of workflow inputs can add types directly. Unfortunately, this requires to have all the projects installed since importing the tasks and their pydantic input models will be needed.

The representation of this new field is also up for discussion. Obviously, each input will be a dict but we could imagine having a serialized version of a pydantic model to define types and default values for all inputs or use an existing standard such as JSON schema. To be checked for feasibility.

Re-reading this, I probably did a very bad job at explaining my thought process but at least, I got it out of my system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions