Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall workflow state / completed state #5701

Open
TomekTrzeciak opened this issue Aug 22, 2023 · 4 comments
Open

Overall workflow state / completed state #5701

TomekTrzeciak opened this issue Aug 22, 2023 · 4 comments
Milestone

Comments

@TomekTrzeciak
Copy link
Contributor

Problem

Cylc has a clear concept of task and job states, but less so when it comes to the overall workflow state. For example, once the workflow has stopped, there is no easy way to tell the underlying reason without digging through the logs or database. In particular, for non-cycling workflows or ones with finite number of cycles it would be useful to easily tell apart normal termination (workflow reached and completed the final cycle) from abnormal one (stalled, server crash, ...). Chatting to @oliver-sanders about it, this seems to be also a prerequisite for having proper support for subworkflow as a task in the future (couldn't find a specific issue for it).

Proposed Solution

A possible solution could be to add a workflow-wide status file akin to job.status that can be scanned for and interrogated for information.

@oliver-sanders
Copy link
Member

For sub-workflows, we can currently use the workflow's exit code which kinda works, however, with this it is hard to tell the difference between a stopped workflow and a completed workflow.

We could add a new top-level workflow status for "completed" workflows. Currently this state can be effectively detected by querying the task-pool table in the database, if there are no entries, then the workflow has completed.

@hjoliver
Copy link
Member

For sub-workflows, we can currently use the workflow's exit code which kinda works, however, with this it is hard to tell the difference between a stopped workflow and a completed workflow.

My sub-workflow example notes this, and addresses it by having the sub-workflow launch script (for the launcher task in the main workflow) check the DB for completion of a known final task in the sub-workflow:

# sub-workflow stopped, but did it succeed?
cylc workflow-state \
    --max-polls=1 \
    --task=${SUBWF_END_TASK#*/} \
    --point=${SUBWF_END_TASK%/*} \
    --status=succeeded \
    $SUBWF_ID

However, your suggestion to use the task pool table is an improvement 🎉 I'll amend my example and alert the couple of NIWA teams with sub-workflow use-cases.

Also, a new top-level workflow status for "completed" is a good idea.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 31, 2024

It would be a good idea to make accessing the "complete" status as easy as possible as this is something that tools like cylc scan will need to do.

Ideally we wouldn't need to go to the database at all (managing database connections is hassle), perhaps a .service file or field thereof?

@oliver-sanders oliver-sanders changed the title Overall workflow state Overall workflow state / completed state Aug 5, 2024
@oliver-sanders
Copy link
Member

See also cylc/cylc-uiserver#618

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants