Skip to content

nexus external_endpoints task hung #9715

@davepacheco

Description

@davepacheco

This was observed in our colo environment and (internal) details are under oxidecomputer/colo#148.

The upshot is that some runaway automation created a ton of TLS certificates again and in one of the three Nexus instances, the external_endpoints task got hung, apparently for days. The root cause is still being determined but appears to be a missed wakeup in the kernel on the database side. This ticket will cover the analysis and information about mitigation and workarounds. More details coming!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions