-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
It's currently very hard to understand what goes wrong when a workflow gets stuck in the "running" phase.
Let's improve the logging (in particular of the job monitor) to clearly understand:
- What events are coming from the cluster (e.g. pod evicted)
- What actions are being taken (e.g. storing logs, setting as failed, skipping as job is still running)
- Why these actions are being taken (e.g. cause of failure)
Some additional ideas:
- make sure that id used in
reana-run-job-<id>
is the same as the job's id (no need for two different identifiers!) - if multiple ids are used to identify the same job then let's always print them together in the logs
Metadata
Metadata
Assignees
Type
Projects
Status
Backlog