Skip to content

Provide an option to retain failed machine #449

Open
@aylei

Description

@aylei

What would you like to be added:
An --machine-delete-policy option that support retaining failed machine. Two strategies are proposed:

  • Delete: current behavior
  • Orphan: remove failed machine from the machine set (orphan it)

Why is this needed:
I run stateful applications on gardener and using local persistent volume, machine deletion is a critical operation because it also delete all data in the local PV. And I've witnessed that all nodes were get replaces when the load balancer of shoot apiserver was unhealthy and lost all local data.
As a more conservative strategy, the machineset controller could orphan the failed machine from the machineset without actually deleting it. The orphaned machines can then be deleted by human operators with their manual confirmation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/usabilityUsability relatedkind/enhancementEnhancement, improvement, extensionlifecycle/rottenNobody worked on this for 12 months (final aging stage)needs/planningNeeds (more) planning with other MCM maintainerspriority/5Priority (lower number equals higher priority)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions