Skip to content

Adding dual-stack support for Cluster API GCP Provider#1582

Open
barbacbd wants to merge 12 commits into
kubernetes-sigs:mainfrom
barbacbd:dual_stack_support
Open

Adding dual-stack support for Cluster API GCP Provider#1582
barbacbd wants to merge 12 commits into
kubernetes-sigs:mainfrom
barbacbd:dual_stack_support

Conversation

@barbacbd
Copy link
Copy Markdown
Contributor

@barbacbd barbacbd commented Jan 8, 2026

api/v1beta/types.go:

Adding support for machines and simple network changes for IPV6 or dual stack work.

  • InternalIpv6PrefixLength
  • IPv6Address
  • StackType

cloud/scope/machine.go:

Adding support for instances and networks to allow dual stack components.

  • InstanceNetworkInterfaceSpec

    • InternalIpv6PrefixLength
    • Ipv6AccessConfigs
      • ExternalIPv6
      • ExternalIpv6PrefixLength
      • Type (when present always set to DIRECT_IPV6)
      • Name (always set to External IPv6)
    • IPv6AccessType
    • Ipv6Address
    • StackType
  • InstanceSpec

    • PrivateIpv6GoogleAccess
  • InstanceNetworkInterfaceAliasIPRangesSpec ** This did not change. The AliasIPs appear to only support IPv4 CIDR or single address format.

cloud/interfaces.go:

Expose a couple of new functions to get StackType and IPvAddress information from the cluster specs. These are only getter functions for the Machine.go to access.

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR will serve as a focal point for adding dual stack support to CAPG.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

#1486
#478

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Basic Support for Dual stack infrastructure is added to CAPG.

The resources include addresses, firewall rules, instances, vpc, and subnets.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jan 8, 2026
@barbacbd
Copy link
Copy Markdown
Contributor Author

barbacbd commented Jan 8, 2026

/hold

@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 8, 2026

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit 96c15e7
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-gcp/deploys/69d532b5e3a4ed000873d47c
😎 Deploy Preview https://deploy-preview-1582--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 8, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @barbacbd. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 8, 2026
@damdo
Copy link
Copy Markdown
Member

damdo commented Jan 9, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2026
Copy link
Copy Markdown
Contributor

@salasberryfin salasberryfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @barbacbd, I left a couple of comments.

Comment thread cloud/scope/machine.go Outdated
Comment thread api/v1beta1/types.go
@barbacbd
Copy link
Copy Markdown
Contributor Author

Leaving this as a reference here. For RouterNat resources there are two variables to consider

  1. Nat64Subnetworks
  2. SourceSubnetworkIpRangesToNat64

from the sdk:
In dual stack subnets, NAT64 will only be enabled for IPv6-only VMs.
If this option is used, the nat64_subnetworks field must be specified.

We can fill this out but it may not make any changes since we will not have ipv6 only vms

Copy link
Copy Markdown
Contributor

@salasberryfin salasberryfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of "nice to haves" that could be added here:

  • E2E test case using a dual stack network configuration.
  • A brief entry in the CAPG book that documents the new feature (this can be a follow-up PR).

Comment thread api/v1beta1/types.go
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 22, 2026
Comment thread api/v1beta1/types.go Outdated
Comment thread api/v1beta1/types.go Outdated
Comment on lines +213 to +224
// InternalIpv6PrefixLength: The prefix length of the primary internal IPv6 range.
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=128
// +optional
InternalIpv6PrefixLength int `json:"internalIpv6PrefixLength,omitempty"`

// Ipv6Address: An IPv6 internal network address for this network interface.
// To use a static internal IP address, it must be unused and in the same
// region as the instance's zone. If not specified, Google Cloud will
// automatically assign an internal IPv6 address from the instance's subnetwork.
// +optional
Ipv6Address string `json:"ipv6Address,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are related, is it useful to have it as a combined CIDR string?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are related but I believe that they are independent. Specifying one does not mean you must specify the other.

Comment thread api/v1beta1/types.go
@barbacbd barbacbd force-pushed the dual_stack_support branch 3 times, most recently from 7e45964 to f921678 Compare January 29, 2026 16:13
@damdo
Copy link
Copy Markdown
Member

damdo commented Feb 5, 2026

/assign @justinsb

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: barbacbd
Once this PR has been reviewed and has the lgtm label, please ask for approval from justinsb. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 5, 2026
Comment thread cloud/scope/machine.go Outdated
},
}

// For now we cannot assign the IPv6AccessConfigs. The bootstrap node is the only one on our side
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb I would love to get your input on this area. The problem we are currently seeing is

  1. We need to set the ipv6 access to INTERNAL to mirror what is default for IPv4. This will also ensure we can have an internal Load Balancer.
  2. We set the network.EnableUlaInternalIpv6 = true to reflect this also and allow internal addresses

The problem is on our bootstrap node. This usually is given a public IP address and is handled in the lines above. But when this instance is created it is no longer given a public/global IPv6 address because the network was set to INTERNAL.

It looks like googles suggestion would be to add another subnet and set that to external then open the bootstrap node on that subnet. We would then create another AccessConfig to match. This seems like alot of work potentially adding more issues.

Any input would be helpful. Thank you!

/cc @damdo @JoelSpeed @salasberryfin

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 25, 2026
@barbacbd barbacbd force-pushed the dual_stack_support branch from fab2e3c to 97ab30c Compare March 25, 2026 19:54
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
@barbacbd barbacbd force-pushed the dual_stack_support branch from 97ab30c to df500e9 Compare March 30, 2026 15:32
api/v1beta/types.go:

Adding support for machines and simple network changes for IPV6 or dual stack work.
- InternalIpv6PrefixLength
- IPv6Address
- StackType

cloud/scope/machine.go:

Adding support for instances and networks to allow dual stack components.

- InstanceNetworkInterfaceSpec
  - InternalIpv6PrefixLength
  - Ipv6AccessConfigs
    - ExternalIPv6
    - ExternalIpv6PrefixLength
    - Type (when present always set to DIRECT_IPV6)
    - Name (always set to External IPv6)
  - IPv6AccessType
  - Ipv6Address
  - StackType

- InstanceSpec
  - PrivateIpv6GoogleAccess

- InstanceNetworkInterfaceAliasIPRangesSpec
 ** This did not change. The AliasIPs appear to only support IPv4 CIDR or single address format.

cloud/interfaces.go:

Expose a couple of new functions to get StackType and IPvAddress information from the cluster specs. These are only
getter functions for the Machine.go to access.
The test currently adds the following parameters:
- internalIpv6PrefixLength: 64
- stackType: DualStack
- Added IPv6 APIServer Address (internal and external) for dual stack configurations
- Added IPv6 Forwarding Rules (internal and external) for dual stack configurations
- Added an address selection policy (IPv4 vs IPv6)

2. cloud/services/compute/loadbalancers/reconcile.go

- Set the IPv6 API Server addresses when dual stack configuration is selected
- Set the IPv6 Forwarding Rules when dual stack configuration is selected.
- Set the ControlPlaneEndpoint to the IPv6 Address when IPV6 Primary is selected
  1. IPv4Primary with IPv4Only stack type - Verifies IPv4 address is used (no IPv6 resources created)
  2. IPv4Primary with DualStack type - Verifies IPv4 address is preferred even though both IPv4 and IPv6 are available
  3. IPv6Primary with DualStack type - Verifies IPv6 address is preferred when configured

  Key Implementation Details

  - Mock Address Hook: Added an InsertHook to the mock addresses that automatically populates the Address field with actual IP addresses:
    - IPv4 addresses get 192.0.2.1
    - IPv6 addresses get 2001:db8::1
  - Verification: The tests verify:
    - The control plane endpoint host contains the correct IP address (IPv4 or IPv6)
    - For DualStack configurations, both IPv4 and IPv6 resources are created
    - For IPv4Only configurations, no IPv6 resources exist

Note: Tests written by Claude and editted by @barbacbd.
  Located at: /Users/bbarbach/dev/k8s-cluster-api-provider-gcp/test/e2e/data/infrastructure-gcp/

  This template creates a DualStack cluster with IPv6 as the primary address:
  - stackType: "DualStack"
  - addressPreferencePolicy: "IPv6Primary"

  2. dual_stack_test.go

  Located at: /Users/bbarbach/dev/k8s-cluster-api-provider-gcp/test/e2e/

  This comprehensive test file includes three test contexts:

  Test 1: DualStack with IPv4Primary (Default)

  - Uses flavor: ci-with-dual-stack
  - Verifies StackType is DualStack
  - Verifies AddressPreferencePolicy defaults to IPv4Primary
  - Validates both IPv4 and IPv6 addresses are allocated
  - Confirms control plane endpoint uses IPv4 address
  - Checks both IPv4 and IPv6 forwarding rules exist

  Test 2: DualStack with IPv6Primary

  - Uses flavor: ci-dual-stack-with-ipv6primary
  - Verifies StackType is DualStack
  - Verifies AddressPreferencePolicy is IPv6Primary
  - Validates both IPv4 and IPv6 addresses are allocated
  - Confirms control plane endpoint uses IPv6 address (key difference)
  - Checks both IPv4 and IPv6 forwarding rules exist

  Test 3: IPv4Only (Default Behavior)

  - Uses default flavor
  - Verifies StackType is IPv4Only or empty (defaults to IPv4Only)
  - Validates only IPv4 addresses are allocated
  - Confirms no IPv6 addresses or forwarding rules exist
  - Verifies control plane endpoint uses IPv4 address

  3. Updated gcp-ci.yaml

  Added the new template to the e2e configuration so it can be used in tests.

  4. Moved Dual stack tests from e2e_test.go to dual_stack_test.go
@barbacbd barbacbd force-pushed the dual_stack_support branch from df500e9 to fbe8678 Compare March 30, 2026 16:05
@damdo
Copy link
Copy Markdown
Member

damdo commented Mar 31, 2026

/test pull-cluster-api-provider-gcp-e2e-test

@barbacbd
Copy link
Copy Markdown
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2026
  1. Detects instance group reference changes - Compares actual Group SelfLinks, not just counts
  2. Handles recreated instance groups - If instance groups get recreated with new SelfLinks, the backend service will be updated
  3. Prevents unnecessary updates - Only updates when backends actually change
  4. Safe for empty backends - Protected by len(backendsvc.Backends) > 0 check
  5. Works with dual-stack - Doesn't interfere with dual-stack forwarding rules
@damdo
Copy link
Copy Markdown
Member

damdo commented Apr 3, 2026

/cc @sadasu @tthvo @nrb

@k8s-ci-robot k8s-ci-robot requested a review from nrb April 3, 2026 13:19
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@damdo: GitHub didn't allow me to request PR reviews from the following users: sadasu, tthvo.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @sadasu @tthvo @nrb

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

barbacbd added 5 commits April 7, 2026 09:50
Update test templates to include IPv6 pod/service CIDRs for complete dual stack
- When specified, the IPv6 Cidr block is set for each subnet (documented in the subnet spec).
- When not specified, Google will assign the CIDR Block itself.
…uster nodes remain secure on private subnets!

  The commented-out code couldn't work because all subnets were forced to use INTERNAL IPv6 (required for internal load balancers). Bootstrap nodes
  with publicIP: true couldn't get public IPv6 addresses.

  Solution

  Implemented a dual subnet architecture where:
  - Public subnets (externalIpv6: true) use EXTERNAL IPv6 with GUA ranges for bootstrap/bastion nodes
  - Private subnets (externalIpv6: false) use INTERNAL IPv6 with ULA ranges for cluster nodes and load balancers

  Changes Made

  1. Added ExternalIpv6 field to SubnetSpec (defaults to false)
  2. Updated subnet creation in cluster.go and managedcluster.go to set Ipv6AccessType based on this field
  3. Extended ClusterGetter interface with Subnets() method
  4. Fixed machine network interface logic to:
    - Look up subnet configuration
    - Enable IPv6 access configs only when both publicIP: true AND externalIpv6: true
    - Removed the broken commented-out code
  5. Regenerated all CRD manifests

Code Changes created by Claude, reviewed and suggested by @barbacbd.
Add the Dual Stack Templates to document Dual Stack Configuration support.
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@barbacbd: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-gcp-verify 96c15e7 link true /test pull-cluster-api-provider-gcp-verify
pull-cluster-api-provider-gcp-apidiff 96c15e7 link false /test pull-cluster-api-provider-gcp-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants