Skip to content

[helm] Report when helm releases get stuck in pending states #21216

@damonmaria

Description

@damonmaria

At the moment the helm check treats any status that is not "failed" as OK.

Returns CRITICAL for a release when its latest revision is in failed state. Returns OK otherwise.

But helm can screw up and get stuck in a pending status (like "pending-update"). While it should temporarily go through this status while releasing it should not get stuck there.

We have run into situations where our automated releases have been failing because a helm release is stuck in pending-update but it is impossible to report on this in Datadog because all other helm metrics cover all revisions, not just the most current one, and non-OK statuses are expected in old revisions. The service check is the only way to get the current revision / state.

Can this check be amended to report a non-OK state when the current status is pending* for an excessive amount of time?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions