Skip to content

Conversation

@jferrazbr
Copy link

Issue: rancher/rancher#52574

Problem

When restoring from an ETCD snapshot, the webhook did not validate the snapshot metadata before accepting spec.rkeConfig.etcdSnapshotRestore.
It was possible to request "kubernetesVersion" or "all" for restoreRKEConfig even when the referenced snapshot had missing or invalid metadata.
This led to restore requests that passed admission but failed later in the restore flow with parse errors.

Solution

This PR adds a validator for spec.rkeConfig.etcdSnapshotRestore on provisioning.cattle.io/v1, Cluster and wires the RKE client into the webhook Clients struct.

The validator:

  • Only runs when etcdSnapshotRestore changes from empty to a new non empty value, so it does not block unrelated cluster updates.
  • Verifies that the snapshot named in etcdSnapshotRestore.name exists in the same namespace.
  • Ensures etcdSnapshotRestore.restoreRKEConfig is one of "none", "kubernetesVersion", or "all".
  • Parses the snapshot metadata and, for "kubernetesVersion", requires a kubernetesVersion, and for "all", requires both kubernetesVersion and rkeConfig.

In addition:

  • The Cluster validator handler registration in pkg/server/handlers.go was moved to a management cluster only list so that validation only runs where snapshot resources exist (local/management cluster). This avoids issues on downstream clusters that do not have the snapshot resources.

Docs are updated to describe the new validation behavior, and unit tests cover the main success and failure paths.
This partially addresses the linked issue by validating snapshot metadata before restore. The annotation based mode filtering will be handled in a follow up change.

CheckList

  • Test
  • Docs

@jferrazbr jferrazbr requested a review from a team as a code owner November 24, 2025 21:55
Copy link
Member

@jiaqiluo jiaqiluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits.

@jferrazbr jferrazbr force-pushed the add-snap-restore-validator branch from ba25be2 to 88b2c50 Compare November 25, 2025 19:36
@jiaqiluo jiaqiluo requested a review from a team November 25, 2025 20:53
Copy link
Contributor

@jakefhyde jakefhyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 nit


// parseSnapshotClusterSpec decodes snapshot.SnapshotFile.Metadata into a v1.ClusterSpec.
// The metadata is stored as a nested, gzipped, base64-encoded structure.
func parseSnapshotClusterSpec(snap *rkev1.ETCDSnapshot) (*v1.ClusterSpec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We copy this from Rancher correct? I wonder if we could move this to github.com/rancher/rancher/pkg/apis/provisioning.cattle.io/v1 somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants