diff --git a/plans/phase-2/cilium-integration.md b/plans/phase-2/cilium-integration.md new file mode 100644 index 00000000..46543f13 --- /dev/null +++ b/plans/phase-2/cilium-integration.md @@ -0,0 +1,396 @@ +--- +title: Cilium Integration for Multicluster Multigres Deployments +state: draft +tags: [network, multicluster, cilium, infrastructure] +--- + +# Summary + +Enable Multigres operator to deploy clusters that work seamlessly with Cilium's multicluster networking capabilities, ensuring east-west traffic between clusters works correctly and that pod placement (via affinity rules) aligns with Cilium's cluster mesh topology requirements. + +# Motivation + +Clients are deploying Multigres across multiple Kubernetes clusters with Cilium for: +- **East-west multicluster networking**: Services in one cluster need to reach services in another +- **Advanced networking**: Cilium's service mesh, network policies, and observability +- **Complex topologies**: Multi-region, multi-zone deployments with specific latency and compliance requirements + +The operator must ensure that: +1. Multigres components can discover and communicate across cluster boundaries +2. Pod affinity/anti-affinity rules respect Cilium's cluster mesh topology +3. Network policies don't inadvertently block required Multigres traffic +4. Service annotations are compatible with Cilium's service mesh features + +## Goals + +1. **Validate Multicluster Connectivity**: Ensure etcd, MultiGateway, MultiOrch, and MultiPooler can communicate across Cilium cluster mesh +2. **Align Affinity with Topology**: Provide guidance and examples for setting `affinity`, `nodeSelector`, and `topologySpreadConstraints` that work with Cilium's cluster topology +3. **Service Mesh Compatibility**: Ensure Multigres Services work with Cilium service mesh (L7 policies, observability) +4. **Documentation**: Provide deployment examples for common multicluster scenarios +5. **Testing**: Validate operator deployments in Cilium-enabled multicluster environments + +## Non-Goals + +1. **Cilium Installation/Management**: Operator does not install or configure Cilium itself +2. **Custom Cilium Policies**: Operator won't generate Cilium-specific NetworkPolicies or CiliumNetworkPolicies (users manage these) +3. **Cilium Feature Development**: Not extending Cilium capabilities, only ensuring compatibility +4. **Non-Cilium CNIs**: This document focuses on Cilium; other CNIs handled separately + +# Proposal + +## Core Strategy + +The operator already exposes standard Kubernetes scheduling primitives (`affinity`, `nodeSelector`, `topologySpreadConstraints`, `tolerations`) in all component CRDs. These primitives are sufficient for Cilium integration - no API changes required. + +**Integration approach**: +1. **Service Annotations**: Support Cilium service mesh annotations in `serviceAnnotations` fields +2. **Pod Labels**: Allow users to set pod labels for Cilium network policy matching +3. **Documentation**: Provide Cilium-specific deployment examples and best practices +4. **Validation**: Test multicluster scenarios in CI/CD with Cilium cluster mesh + +## Multicluster Architecture Patterns + +### Pattern 1: Shared Etcd, Distributed Gateways + +``` +Cluster A (us-east-1) Cluster B (eu-west-1) +┌─────────────────────┐ ┌─────────────────────┐ +│ Etcd (3 replicas) │◄──────►│ MultiGateway │ +│ MultiGateway │ │ MultiPooler │ +│ MultiOrch │ └─────────────────────┘ +│ MultiPooler │ +└─────────────────────┘ + ▲ + │ Cilium Cluster Mesh + ▼ +``` + +**Affinity Strategy**: +- Etcd pods use `podAntiAffinity` to spread across zones within Cluster A +- MultiGateway/MultiPooler in Cluster B use `nodeSelector` to pin to eu-west-1 +- Services annotated with `io.cilium/global-service: "true"` for cross-cluster discovery + +### Pattern 2: Federated Etcd Across Clusters + +``` +Cluster A Cluster B +┌─────────────────────┐ ┌─────────────────────┐ +│ Etcd (replicas 2) │◄──────►│ Etcd (replicas 1) │ +│ MultiGateway │ │ MultiGateway │ +└─────────────────────┘ └─────────────────────┘ +``` + +**Affinity Strategy**: +- Etcd members in each cluster use `topologySpreadConstraints` with `topology.kubernetes.io/zone` +- Cilium ClusterMesh enables etcd peer discovery across clusters +- `podAntiAffinity` prevents multiple etcd members on same node + +### Pattern 3: Regional Isolation with Global Gateway + +``` +Cluster A (Region 1) Cluster B (Region 2) +┌─────────────────────┐ ┌─────────────────────┐ +│ Etcd │ │ Etcd │ +│ MultiPooler │ │ MultiPooler │ +└─────────────────────┘ └─────────────────────┘ + ▲ ▲ + └──────────┬───────────────────┘ + │ + Global Load Balancer Cluster + ┌─────────────────────┐ + │ MultiGateway (HA) │ + └─────────────────────┘ +``` + +**Affinity Strategy**: +- MultiGateway uses `nodeAffinity` to schedule on dedicated gateway nodes +- MultiPooler uses `podAntiAffinity` to avoid co-location +- Cilium's `io.cilium/lb-ipam-ips` annotation for consistent LB addressing + +# Design Details + +## Service Annotations for Cilium + +Multigres components should support Cilium-specific service annotations. The existing `serviceAnnotations` field in MultiGatewaySpec already supports this. + +**Recommended Cilium Annotations**: + +```yaml +apiVersion: multigres.io/v1alpha1 +kind: MultiGateway +metadata: + name: global-gateway +spec: + serviceAnnotations: + # Global service for cross-cluster access + io.cilium/global-service: "true" + # L7 traffic management + io.cilium/lb-ipam-ips: "10.0.0.100" + # Shared backend pools across clusters + io.cilium/service-affinity: "remote" + serviceType: LoadBalancer +``` + +## Pod Labels and Network Policies + +Users should be able to apply pod labels for Cilium network policy matching: + +```yaml +apiVersion: multigres.io/v1alpha1 +kind: Etcd +metadata: + name: my-etcd +spec: + podLabels: + # Cilium network policy selectors + network-zone: "trusted" + multigres-component: "etcd" + podAnnotations: + # Cilium policy enforcement mode + policy.cilium.io/proxy-visibility: "," +``` + +## Affinity Rules for Multicluster Topologies + +### Cross-Cluster Etcd with High Availability + +```yaml +apiVersion: multigres.io/v1alpha1 +kind: Etcd +metadata: + name: federated-etcd +spec: + replicas: 5 + affinity: + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - labelSelector: + matchExpressions: + - key: app.kubernetes.io/component + operator: In + values: + - etcd + # Spread across both topology zones AND clusters + topologyKey: topology.kubernetes.io/zone + topologySpreadConstraints: + - maxSkew: 1 + topologyKey: topology.kubernetes.io/zone + whenUnsatisfiable: DoNotSchedule + labelSelector: + matchLabels: + app.kubernetes.io/component: etcd + - maxSkew: 2 + # Cilium adds this label to nodes in cluster mesh + topologyKey: topology.cilium.io/cluster + whenUnsatisfiable: ScheduleAnyway + labelSelector: + matchLabels: + app.kubernetes.io/component: etcd +``` + +### Regional MultiPooler Placement + +```yaml +apiVersion: multigres.io/v1alpha1 +kind: MultiPooler +metadata: + name: regional-pooler +spec: + replicas: 3 + nodeSelector: + topology.kubernetes.io/region: "us-east-1" + affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + labelSelector: + matchLabels: + app.kubernetes.io/component: multipooler + topologyKey: kubernetes.io/hostname + tolerations: + - key: "dedicated" + operator: "Equal" + value: "database" + effect: "NoSchedule" +``` + +## DNS and Service Discovery + +**Cilium ClusterMesh DNS Requirements**: +- Etcd pods must be able to resolve peers across clusters +- Use Cilium's global service discovery or explicit FQDNs + +**Etcd Peer URLs**: +```yaml +# In cluster A, etcd member 0: +--initial-cluster=\ + etcd-0=https://etcd-0.etcd-headless.default.svc.cluster.local:2380,\ + etcd-1=https://etcd-1.etcd-headless.cluster-b.mesh.cilium.io:2380 +``` + +**Operator Consideration**: The operator should support an optional `peerURLs` or `externalPeers` field for multicluster etcd bootstrapping (future enhancement). + +## Network Policies + +**Default Policy Approach**: The operator does NOT create NetworkPolicies by default. Users must create Cilium NetworkPolicies manually. + +**Example Cilium NetworkPolicy for Etcd**: +```yaml +apiVersion: cilium.io/v2 +kind: CiliumNetworkPolicy +metadata: + name: etcd-multicluster +spec: + endpointSelector: + matchLabels: + app.kubernetes.io/component: etcd + ingress: + - fromEndpoints: + - matchLabels: + app.kubernetes.io/component: etcd + toPorts: + - ports: + - port: "2379" # client + protocol: TCP + - port: "2380" # peer + protocol: TCP + # Allow cross-cluster traffic + - fromCIDR: + - 10.0.0.0/8 # Cluster mesh CIDR + toPorts: + - ports: + - port: "2379" + - port: "2380" +``` + +## Observability and Metrics + +Cilium provides Hubble for network observability. Multigres components should be discoverable: + +**Pod Annotations for Hubble**: +```yaml +spec: + podAnnotations: + # Enable Hubble flow visibility + hubble.cilium.io/visibility: "enabled" + # Prometheus metrics scraping (if using Cilium service mesh metrics) + prometheus.io/scrape: "true" + prometheus.io/port: "9090" +``` + +## Test Plan + +### Unit Tests +- Validate that existing affinity/topology fields accept Cilium-specific topology keys +- Ensure service annotations are properly propagated to generated Service manifests + +### Integration Tests +1. **Single Cluster with Cilium**: Deploy operator in kind with Cilium CNI +2. **Two-Cluster Mesh**: Deploy etcd in cluster A, MultiGateway in cluster B, verify connectivity +3. **Network Policy Validation**: Apply restrictive CiliumNetworkPolicy, ensure operator components still function +4. **Service Mesh Mode**: Enable Cilium service mesh, verify L7 observability and policies don't break traffic + +### Manual Validation Scenarios +1. Deploy Multigres with etcd spread across 3 clusters +2. Verify `kubectl get svc` shows global services with Cilium annotations +3. Test failover: kill etcd pod in one cluster, verify quorum maintained via cluster mesh +4. Validate Hubble UI shows cross-cluster Multigres traffic flows + +## Graduation Criteria + +### MVP (Initial Release) +- [ ] Documentation with Cilium deployment examples +- [ ] Basic multicluster test (2 clusters, shared etcd) +- [ ] Service annotation support validated + +### Production-Ready +- [ ] CI/CD pipeline includes Cilium multicluster tests +- [ ] Performance benchmarks for cross-cluster latency +- [ ] Advanced examples (regional isolation, federated etcd) + +### Stable +- [ ] Production deployments validated with Cilium +- [ ] Network policy reference architecture documented +- [ ] Troubleshooting guide for common Cilium issues + +## Upgrade / Downgrade Strategy + +**No operator changes required** - integration is via existing Kubernetes primitives and annotations. + +**User migration path**: +1. Existing deployments continue to work +2. Users opt-in to Cilium features by adding annotations/labels +3. Gradual rollout: update one component CR at a time with Cilium-specific configs + +## Version Skew Strategy + +**Cilium Versions**: Test against Cilium 1.18 (stable version with mature cluster mesh). + +**Kubernetes Versions**: Support Kubernetes 1.32+ (standard operator support range). + +**Skew Scenarios**: +- Different Cilium minor versions across clusters should work (Cilium ClusterMesh is version-tolerant within same major version) +- Operator version doesn't affect network layer - no skew issues + +# Implementation History + +- 2025-10-16: Initial draft created based on client multicluster requirements + +# Drawbacks + +1. **Complexity**: Multicluster deployments are inherently complex; troubleshooting network issues requires Cilium expertise +2. **Testing Burden**: CI/CD must maintain multicluster test infrastructure (expensive) +3. **Documentation Maintenance**: As Cilium evolves, examples may need updates +4. **No Abstraction**: Operator doesn't abstract Cilium details - users must understand both technologies + +# Alternatives + +## Alternative 1: Custom Cilium CRD Support + +**Approach**: Operator creates CiliumNetworkPolicy resources automatically. + +**Rejected because**: +- Opinionated networking is anti-pattern for infrastructure operators +- Different users have different security requirements +- Would require deep Cilium version coupling + +## Alternative 2: MultiCluster CRD + +**Approach**: Create a new `MultiClusterMultigres` CRD that abstracts cross-cluster deployment. + +**Rejected because**: +- Adds significant API complexity +- Users already have tools (GitOps, Helm) for multi-cluster orchestration +- Operator should focus on single-cluster resource management + +## Alternative 3: Cilium CNI Autodetection + +**Approach**: Operator detects Cilium CNI and automatically applies best-practice annotations. + +**Rejected because**: +- Magic behavior is confusing and hard to debug +- Different deployments need different configurations +- Explicit is better than implicit for production systems + +# Infrastructure Needed + +## Development Environment +- Kind clusters (3x) with Cilium 1.18 CNI installed +- Cilium CLI for cluster mesh setup +- Hubble UI for network observability validation + +## CI/CD +- GitHub Actions workflow for multicluster tests +- Infrastructure to spin up multiple clusters (consider using kind or k3s) +- Cilium cluster mesh automation scripts + +## Documentation +- Multicluster deployment guide (new doc in `docs/`) +- Cilium-specific examples in `config/samples/cilium/` +- Troubleshooting section in operator docs + +## Testing Tools +- `cilium connectivity test` integration +- Network latency measurement tools +- Cross-cluster service discovery validation scripts