Overall workflow state / completed state #5701

TomekTrzeciak · 2023-08-22T08:09:10Z

Problem

Cylc has a clear concept of task and job states, but less so when it comes to the overall workflow state. For example, once the workflow has stopped, there is no easy way to tell the underlying reason without digging through the logs or database. In particular, for non-cycling workflows or ones with finite number of cycles it would be useful to easily tell apart normal termination (workflow reached and completed the final cycle) from abnormal one (stalled, server crash, ...). Chatting to @oliver-sanders about it, this seems to be also a prerequisite for having proper support for subworkflow as a task in the future (couldn't find a specific issue for it).

Proposed Solution

A possible solution could be to add a workflow-wide status file akin to job.status that can be scanned for and interrogated for information.

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2023-08-22T08:20:51Z

For sub-workflows, we can currently use the workflow's exit code which kinda works, however, with this it is hard to tell the difference between a stopped workflow and a completed workflow.

We could add a new top-level workflow status for "completed" workflows. Currently this state can be effectively detected by querying the task-pool table in the database, if there are no entries, then the workflow has completed.

hjoliver · 2023-08-24T00:34:19Z

For sub-workflows, we can currently use the workflow's exit code which kinda works, however, with this it is hard to tell the difference between a stopped workflow and a completed workflow.

My sub-workflow example notes this, and addresses it by having the sub-workflow launch script (for the launcher task in the main workflow) check the DB for completion of a known final task in the sub-workflow:

# sub-workflow stopped, but did it succeed?
cylc workflow-state \
    --max-polls=1 \
    --task=${SUBWF_END_TASK#*/} \
    --point=${SUBWF_END_TASK%/*} \
    --status=succeeded \
    $SUBWF_ID

However, your suggestion to use the task pool table is an improvement 🎉 I'll amend my example and alert the couple of NIWA teams with sub-workflow use-cases.

Also, a new top-level workflow status for "completed" is a good idea.

oliver-sanders · 2024-07-31T12:08:12Z

It would be a good idea to make accessing the "complete" status as easy as possible as this is something that tools like cylc scan will need to do.

Ideally we wouldn't need to go to the database at all (managing database connections is hassle), perhaps a .service file or field thereof?

oliver-sanders · 2024-08-05T13:38:23Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overall workflow state / completed state #5701

Overall workflow state / completed state #5701

TomekTrzeciak commented Aug 22, 2023

oliver-sanders commented Aug 22, 2023

hjoliver commented Aug 24, 2023

oliver-sanders commented Jul 31, 2024 •

edited

Loading

oliver-sanders commented Aug 5, 2024

Overall workflow state / completed state #5701

Overall workflow state / completed state #5701

Comments

TomekTrzeciak commented Aug 22, 2023

Problem

Proposed Solution

oliver-sanders commented Aug 22, 2023

hjoliver commented Aug 24, 2023

oliver-sanders commented Jul 31, 2024 • edited Loading

oliver-sanders commented Aug 5, 2024

oliver-sanders commented Jul 31, 2024 •

edited

Loading