Skip to content

Enhancement - Early Cluster.status.controlPlaneEndpoint Propagation #475

@CodeBleu

Description

@CodeBleu

Request early propagation of controlPlaneEndpoint to break Talos bootstrap deadlock

Proposed Enhancement

To be added

CAPC should propagate CloudStackCluster.spec.controlPlaneEndpoint.hostCluster.status.controlPlaneEndpoint.host immediately after public IP allocation, before waiting for VMs to reach "Running" state.

Why is this needed?

Creates chicken-egg deadlock with Talos bootstrap provider:
CAPC allocates public IP → CloudStackCluster.spec.controlPlaneEndpoint.host = "1.2.3.4" ✅
TalosConfigTemplate generates → certSANs: [] (reads Cluster.status → empty) ❌
VMs boot with broken configs → stuck "Starting" in CloudStack ❌
CAPC waits for "Running" → never populates Cluster.status → deadlock

Current logs prove this exact scenario:
"CloudStackCluster.spec.controlPlaneEndpoint.host": "1.2.3.4" ✅
"Instance not ready, is Starting" ❌
Cluster.status.controlPlaneEndpoint.host: "" ❌
certSANs: [] ❌

How does current CAPC work?

Public IP allocated → VMs "Running" → THEN Cluster.status updated

How should it work?

Public IP allocated → Cluster.status updated → Talos certSANs populated → VMs boot → LB rules

Precedents (Other Providers)

✅ CAPZ: Updates after LB IP allocation (before nodes ready)
✅ CAPM3: Updates after IPAM (before bare metal provisioned)
✅ CAPV: Updates after load balancer creation (before VMs ready)
❌ CAPC: Waits for VMs "Running" (breaks Talos bootstrap)

Impact

  • Talos provider: Fully automatic bootstrap (no manual certSANs patches)
  • Production templates: Dynamic IP allocation works end-to-end
  • Existing clusters: No breaking changes (idempotent status update)

Risk Assessment

✅ CAPI InfraCluster contract compliant
✅ Idempotent - safe retry
✅ Matches other provider timing patterns
✅ No impact on kubeadm/other bootstrap providers
✅ Testable with existing e2e framework

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions