You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For background (skip if you know this), ingress and k8s services only send traffic to pods marked ready. If any container in the pod is not marked ready, no traffic will be sent to the pod. This is to handle zero downtime rotations of pods in replicasets.
The spire-server and spire-controller-manager have different roles in spire. spire-server is responsible for API and serving requests. If it's down, especially in the statefulset deployment, spire eventually stops working entirely. However, spire-controller-manager is responsible for managing CRs in the cluster. If it's down, the impact is more nuanced.
Since these two containers are stuck in the same pod, when either of them are down, spire backend workload API is down. This will eventually take down spire-agents ability to service requests. The controller needs to be moved to a separate pod so its outages do not impact spire itself. I'm facing a problem where a federated endpoint lost SSL cert. Controller Manager is restarting causing outages to spire (not just federation problems).
2024-04-30T12:59:33Z ERROR setup problem running manager {"error": "failed to wait for clusterspiffeid caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ClusterSPIFFEID"}
main.run
/workspace/main.go:347
main.main
/workspace/main.go:82
runtime.main
/usr/local/go/src/runtime/proc.go:250
2024-04-30T12:59:33Z DEBUG events spire-server-0_84b56424-9ab4-495f-bf58-c2efca64d303 stopped leading {"type": "Normal", "object": {"kind":"Lease","namespace":"spire-server","name":"8aa27f40.spiffe.io","uid":"5deab3b1-5f1d-4855-8cc5-f15bcdcbbee0","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1272111663"}, "reason": "LeaderElection"}
2024-04-30T12:59:33Z ERROR error received after stop sequence was engaged {"error": "leader election lost"}
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:555
The text was updated successfully, but these errors were encountered:
For background (skip if you know this), ingress and k8s services only send traffic to pods marked ready. If any container in the pod is not marked ready, no traffic will be sent to the pod. This is to handle zero downtime rotations of pods in replicasets.
The spire-server and spire-controller-manager have different roles in spire. spire-server is responsible for API and serving requests. If it's down, especially in the statefulset deployment, spire eventually stops working entirely. However, spire-controller-manager is responsible for managing CRs in the cluster. If it's down, the impact is more nuanced.
Since these two containers are stuck in the same pod, when either of them are down, spire backend workload API is down. This will eventually take down spire-agents ability to service requests. The controller needs to be moved to a separate pod so its outages do not impact spire itself. I'm facing a problem where a federated endpoint lost SSL cert. Controller Manager is restarting causing outages to spire (not just federation problems).
The text was updated successfully, but these errors were encountered: