Skip to content

k8s: improve logging and debugging #432

@mdonadoni

Description

@mdonadoni

It's currently very hard to understand what goes wrong when a workflow gets stuck in the "running" phase.

Let's improve the logging (in particular of the job monitor) to clearly understand:

  • What events are coming from the cluster (e.g. pod evicted)
  • What actions are being taken (e.g. storing logs, setting as failed, skipping as job is still running)
  • Why these actions are being taken (e.g. cause of failure)

Some additional ideas:

  • make sure that id used in reana-run-job-<id> is the same as the job's id (no need for two different identifiers!)
  • if multiple ids are used to identify the same job then let's always print them together in the logs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions