`GCPManagedMachinePool` scaling via `MachinePool.spec.replicas` is silently ignored

/kind bug

**What steps did you take and what happened:**
1. Created a GCPManagedMachinePool without an explicit spec.scaling block.
2. Waited for the node pool to reach RUNNING state and check the correct number of replicas was set in GCP.
3. Edit MachinePool.spec.replicas to a different value.
4. Observe: the controller reconciles repeatedly but the node pool size does not change.

You see a loop in the logs of 4 log lines
```
"Reconciling node pool resources"
"Node pool running"
"Node pool config update required" request="...resource_labels:{labels:{key:\"capg-cluster-<name>\" value:\"owned\"}}"
"Node pool config updating in progress"
```

**What did you expect to happen:**
The GCP nodepool would scale to the correct number of replicas

**Anything else you would like to add:**
There seem to be three issues that compound on each other to cause this to happen:

1. There's a perpetual diff on `resource_labels`, so when the NodePool gets created the controller injects `capg-cluster-<name>` into `NodeConfig.ResourceLabels`. However, the semantics of this mean that that label is set on the individual VMs not on to the NodePool object itself. As such `checkDiffAndPrepareUpdateConfig` returns a diff each time and returns early, calling `updateNodePool` which is futile, because [NodePool](https://docs.cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.locations.clusters.nodePools?hl=en#NodePool) doesn't have any labels, and so it loops endlessly.
 - If this was to be fixed so it worked properly it would need an additional API call to get the underlying `InstanceTemplate`, check the labels on that and then report back rather than relying on checking the NodePool object itself
2. In a related vein, `ConvertToSdkLinuxNodeConfig`, when called with a `nil` value, always produces a non-nil empty struct, which always differs from the result of the GKE API for pools that have no Linux Node config set. Which triggers the same bug.
3. The three update checks that are part of `checkDiffAndPrepareUpdateConfig` each return early if they find a difference. So the first point, starves the autoscaling and size checks of ever running
4. `ConvertSdkAutoscaling`, when called with a `nil` value, produces a struct with `Enabled: true`. As such, any size updates are skipped silently because the autoscaling guard returns early. As such if you want to do manual scaling, you're forced to explicitly set `spec.scaling.enableAutoscaling: false`. 
All of this compounds together to produce a constantly reconciling controller, and the ability to manually scale a nodepool effectively blocked.

**Environment:**

- Cluster-api version: v0.26.0 (Operator)
- Minikube/KIND version: N/A
- Kubernetes version: (use `kubectl version`): 1.35.1
- OS (e.g. from `/etc/os-release`): CoreOS


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`GCPManagedMachinePool` scaling via `MachinePool.spec.replicas` is silently ignored #1656

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GCPManagedMachinePool scaling via MachinePool.spec.replicas is silently ignored #1656

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`GCPManagedMachinePool` scaling via `MachinePool.spec.replicas` is silently ignored #1656