[Bug]: Unable to use the load balancer in production during a rolling update #9902

Alexander-Volk · 2024-04-01T11:22:56Z

Alexander-Volk
Apr 1, 2024

Bug Description

With a high probability, this is a bug, since the solutions described in the original article do not work and exclude one of the main features of kafka - that this is a production solution and the update of any components can occur without downtime and data loss. However, when using a solution with load balancer, there is data loss and downtime in the work of kafka brokers.

I will describe the process: we have a private network in AWS and have deployed an EKS cluster on this private network. We use the CNI plugin, so the pods in the cluster are accessible by IP addresses from our private network. This gives us the opportunity to publish our applications using Load Balancer and Cluster IP k8s service. To link the load balancer and the service, the Target group (with required health checks) and the TargetGroupBinding resource are used to link the k8s service to the aws target group. And this solution allows you to use one NLB for bootstrap service and for brokers, which allows you not to produce unnecessary NLB instances. I would treat this solution as a more advanced solution that branches off from yours using a single NLB And NLB allows you to use TLS termination. Our solution is more automated without manual steps.

The solution is working and shows itself perfectly as long as we do not add an uncontrolled rolling update. Our solution currently uses 3 brokers and a topic with the configuration replicas: 3 and min.insync.replicas: 2. After the deployment of the strimzi kafka cluster, everything works until the moment when we need to update the brokers. When any configuration change is made that affects the brokers state, a rolling update is launched, which usually completes successfully. When restarting the first broker, the solution still works because the configuration allows you to lose one of the brokers. Next, the first broker is restored and the second broker starts restarting almost immediately. And that's when the mistakes start. After restoring the first broker, in fact, there is no connection yet, and when the second broker starts updating, it turns out that two brokers are unavailable at once, but strimzi thinks otherwise, hence the errors. For some reason, the strimzi operator updates brokers very, very quickly, which prevents health checks on load balancers from passing before the second broker is restarted

We also have a solution based on the use of Load Balancer k8s service, which creates NLB for each broker and for bootstrap service. In fact, this is exactly the solution described here, but in the Alibaba ACK cluster and using Alibaba NLB. Everything happens exactly the same way - when updating brokers, when the first broker is updated, everything is OK, after restoring the first broker and starting updating the second broker, data loss occurs

Steps to reproduce

Set listeners = load balancer in kafak cluster configuration
Deploy the strimzi kafka cluster
Create the kafka topic
Connect the producer and consumer to kafka topic
Update the broker - add a listener, change the auth method, or any other change that triggers the rolling update

Expected behavior

Rolling update works correctly and brokers are updated taking into account the load balancer health checks. There are no errors in the logs and no data loss. The messages are in the correct order

Strimzi version

0.39.0

Kubernetes version

v1.28.6-eks-508b6b3, v1.28.3-aliyun.1

Installation method

Helm chart + ArgoCD

Infrastructure

AWS EKS, Alibaba ACK

Configuration files and logs

For AWS EKS cluster:

Kafka cluster:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: test
spec:
  kafka:
    version: 3.6.1
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
        authentication:
          type: scram-sha-512
      - name: external
        port: 9094
        type: cluster-ip
        tls: false # Because we use TLS termination
        authentication:
          type: scram-sha-512
        configuration:
          bootstrap:
            alternativeNames:
              - <my nlb dns>
          brokers:
            - broker: 0
              advertisedHost: <my nlb dns>
              advertisedPort: 9095
            - broker: 1
              advertisedHost: <my nlb dns>
              advertisedPort: 9096
            - broker: 2
              advertisedHost: <my nlb dns>
              advertisedPort: 9097
    authorization:
      type: simple
      superUsers:
        - system.admin
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.6"
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 100
          deleteClaim: true
    template:
      clusterRoleBinding:
        metadata:
          labels:
            argocd.argoproj.io/instance: kafka-prod
      pod:
        metadata:
          annotations:
            prometheus.io/port: 9404
        terminationGracePeriodSeconds: 120
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod

      podDisruptionBudget:
        maxUnavailable: 0
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-prod-metrics
          key: kafka-metrics-config.yml
    rack:
      topologyKey: topology.kubernetes.io/zone
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 50
      deleteClaim: true
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-prod-metrics
          key: zookeeper-metrics-config.yml
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/port: 9404
        terminationGracePeriodSeconds: 120

        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
      podDisruptionBudget:
        maxUnavailable: 0
  entityOperator:
    topicOperator: { }
    userOperator: { }
    template:
      pod:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
  cruiseControl:
    template:
      pod:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
  kafkaExporter:
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/port: 9404
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
    topicRegex: ".*"
    groupRegex: ".*"

Kafka topic:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: test-topic
  labels:
    strimzi.io/cluster: test
spec:
  partitions: 9
  replicas: 3
  config:
    min.insync.replicas: 2
    cleanup.policy: delete
    retention.ms: 1728000000

TargetGroupBinding - it is created for bootstrap service and for each broker

apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: kafka-bootstrap-tgb
spec:
  serviceRef:
    name: test-kafka-external-bootstrap
    port: 9094
  targetGroupARN: <bootstrap aws target group arn>
  targetType: ip
---
apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: kafka-broker-0-tgb
spec:
  serviceRef:
    name: test-kafka-0
    port: 9094
  targetGroupARN: <broker-0 aws target group arn>
  targetType: ip

For Alibaba ACK cluster:

Kafka Cluster:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: test
spec:
  kafka:
    version: 3.6.1
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
        authentication:
          type: scram-sha-512
      - name: external
        port: 9094
        type: loadbalancer
        tls: false
        authentication:
          type: scram-sha-512
        configuration:
          class: alibabacloud.com/nlb
          bootstrap:
            annotations:
              service.beta.kubernetes.io/alibaba-cloud-loadbalancer-zone-maps: "<az and vswitch ids>"
              service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet
              service.beta.kubernetes.io/alibaba-cloud-loadbalancer-name: prod-kafka-test-nlb-bootstrap
              service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
          brokers:
            - broker: 0
              annotations:
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-zone-maps: "<az and vswitch ids>"
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-name: prod-kafka-test-nlb-broker-0
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
            - broker: 1
              annotations:
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-zone-maps: "<az and vswitch ids>"
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-name: prod-kafka-test-nlb-broker-1
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
            - broker: 2
              annotations:
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-zone-maps: "<az and vswitch ids>"
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-name: prod-kafka-test-nlb-broker-2
                service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
    livenessProbe:
      initialDelaySeconds: 30
    readinessProbe:
      initialDelaySeconds: 20
    authorization:
      type: simple
      superUsers:
        - system.admin
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.6"
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 100Gi
          deleteClaim: true
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/port: 9404
        terminationGracePeriodSeconds: 120
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
      podDisruptionBudget:
        maxUnavailable: 0
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-test-metrics
          key: kafka-metrics-config.yml
    rack:
      topologyKey: topology.kubernetes.io/zone
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 50Gi
      deleteClaim: true
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-test-metrics
          key: zookeeper-metrics-config.yml
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/port: 9404
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
      podDisruptionBudget:
        maxUnavailable: 0
  entityOperator:
    topicOperator: { }
    userOperator: { }
    template:
      pod:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
  cruiseControl:
    template:
      pod:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
  kafkaExporter:
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/port: 9404
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: test/environment
                      operator: In
                      values:
                        - prod
    topicRegex: ".*"
    groupRegex: ".*"

The topic is the same as for aws. TargetGroupBinding is not needed because listener = LoadBalancer

Additional context

It seems like it should work out of the box, since kafka is a production solution. A solution could be the possibility of manual updating for a specific indication for which pods or delayed updating of subsequent pods. Perhaps this function will help

Alexander-Volk · 2024-04-01T11:39:21Z

Alexander-Volk
Apr 1, 2024
Author

I forgot to attach the logs. These are the errors that appear on the consumer when updating brokers. Errors occur precisely when updating the second broker, as I described above, at this point it turns out that two brokers are unavailable. Logs from Alibaba ACK with listener = load balancer:

%6|1711971199.460|FAIL|rdkafka#consumer-1| [thrd:sasl_plaintext://nlb-9pc11wcc8o52h39l70.cn-hongkong.nlb.aliyunc]: sasl_plaintext://<nlb dns name>:9094/bootstrap: Disconnected (after 30686ms in state UP)
%6|1711971199.470|FAIL|rdkafka#consumer-1| [thrd:sasl_plaintext://<nlb dns name>]: sasl_plaintext://<nlb dns name>s.com:9094/1: Disconnected (after 28601ms in state UP)
% ERROR: Local: Broker transport failure: sasl_plaintext://<nlb dns name>:9094/bootstrap: Disconnected (after 30686ms in state UP)
% ERROR: Local: Broker transport failure: sasl_plaintext://<nlb dns name>s.com:9094/1: Disconnected (after 28601ms in state UP)
%3|1711971199.770|FAIL|rdkafka#consumer-1| [thrd:sasl_plaintext://<nlb dns name>]: sasl_plaintext://<nlb dns name>s.com:9094/1: Connect to ipv4#192.168.32.8:9094 failed: Connection refused (after 299ms in state CONNECT)
% ERROR: Local: Broker transport failure: sasl_plaintext://<nlb dns name>s.com:9094/1: Connect to ipv4#192.168.32.8:9094 failed: Connection refused (after 299ms in state CONNECT)
%3|1711971200.032|FAIL|rdkafka#consumer-1| [thrd:sasl_plaintext://<nlb dns name>]: sasl_plaintext://<nlb dns name>s.com:9094/1: Connect to ipv4#192.168.32.86:9094 failed: Connection refused (after 262ms in state CONNECT)
% ERROR: Local: Broker transport failure: sasl_plaintext://<nlb dns name>s.com:9094/1: Connect to ipv4#192.168.32.86:9094 failed: Connection refused (after 262ms in state CONNECT)
%3|1711971200.601|FAIL|rdkafka#consumer-1| [thrd:sasl_plaintext://<nlb dns name>]: sasl_plaintext://<nlb dns name>s.com:9094/1: Connect to ipv4#192.168.32.8:9094 failed: Connection refused (after 262ms in state CONNECT)
% ERROR: Local: Broker transport failure: sasl_plaintext://<nlb dns name>s.com:9094/1: Connect to ipv4#192.168.32.8:9094 failed: Connection refused (after 262ms in state CONNECT)

0 replies

scholzj · 2024-04-01T13:07:29Z

scholzj
Apr 1, 2024
Maintainer

I not sure I understand what the supposed issue is, as you seem to be mixing many different things together:

A blog post which talks about how to hack a single NLB with different ports against the whole Kafka cluster
type: loadbalancer listeners where Strimzi orchestrates the load balancer type Kubernetes services one load balancer per broker
A custom CNI that exposes your Pod IPs outside your Kubernetes cluster

Each of these are completely different solutions and only one of them is really part of Strimzi. If there is any problem, you will need to explain it in a way where it is understandable to people who do not know your environment.

12 replies

scholzj Apr 3, 2024
Maintainer

I'm sorry, I do not use any special settings. I normally use just a regular kubeadm based cluster with AWS cloud provider integration or OpenShift and what they configure out of the box.

Alexander-Volk Apr 3, 2024
Author

Then how does it work for you without special configuration and without taking into account the time spent on health checks? This can't even theoretically work during a rolling update that does not consider this time...

Alexander-Volk Apr 8, 2024
Author

@scholzj
I just want to make sure I was understood correctly. Can we move this discussion to a bug report since that's essentially what it is?

scholzj Apr 8, 2024
Maintainer

I don't think this is a bug as it does not seem to be experienced by anyone else. It also does not have a clear solution, so I would expect the bug to either be closed again or just get stale because it will not be actionable. I already suggested what I think should be the next steps before for something that is actionable and might get done one day:

Feel free to write a proposal and contribute something similar. But I do not think this is really as simple as you suggest as it will need to fit into the proper Kafka rolling update algorithm that is very different from the default StatefulSet rolling update which does not work properly for complex applications such as Apache Kafka. Alternatively, you can also open an enhancement to add such a configurable option and see if someone else might do it one day (we just have to make sure it is clear what are you asking for).

Alexander-Volk Apr 8, 2024
Author

The issue is inevitable as it currently stands, appearing to be a fact, since it's the default behavior of rolling updates with LB. Just give it a try.. Okay, I will open an enhancement and describe it without unnecessary details, thank you!

scholzj · 2024-05-05T21:31:44Z

scholzj
May 5, 2024
Maintainer

I run into this today ... if your problem is for whatever reason as you describe it, have you considered / tried to use something like this? https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/

1 reply

Alexander-Volk May 7, 2024
Author

I didn't try it because I didn't know about such a feature. And it looks like it won't work initially, because you need to use target-type = ip, and service = load_balancer by default, all instances in targets are used (target-type: instance)

bhargav2427 · 2024-09-21T15:52:43Z

bhargav2427
Sep 21, 2024

Thank You @scholzj and @Alexander-Volk. I have the same set up and @Alexander-Volk your solution seems to be working. Kafka broker pod does not become healthy until it is healthy in AWS target group.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strimzi

[Bug]: Unable to use the load balancer in production during a rolling update #9902

{{title}}

Replies: 4 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Strimzi

[Bug]: Unable to use the load balancer in production during a rolling update #9902

Alexander-Volk Apr 1, 2024

Bug Description

Steps to reproduce

Expected behavior

Strimzi version

Kubernetes version

Installation method

Infrastructure

Configuration files and logs

Additional context

Replies: 4 comments · 13 replies

Alexander-Volk Apr 1, 2024 Author

scholzj Apr 1, 2024 Maintainer

scholzj Apr 3, 2024 Maintainer

Alexander-Volk Apr 3, 2024 Author

Alexander-Volk Apr 8, 2024 Author

scholzj Apr 8, 2024 Maintainer

Alexander-Volk Apr 8, 2024 Author

scholzj May 5, 2024 Maintainer

Alexander-Volk May 7, 2024 Author

bhargav2427 Sep 21, 2024

Alexander-Volk
Apr 1, 2024

Replies: 4 comments 13 replies

Alexander-Volk
Apr 1, 2024
Author

scholzj
Apr 1, 2024
Maintainer

scholzj Apr 3, 2024
Maintainer

Alexander-Volk Apr 3, 2024
Author

Alexander-Volk Apr 8, 2024
Author

scholzj Apr 8, 2024
Maintainer

Alexander-Volk Apr 8, 2024
Author

scholzj
May 5, 2024
Maintainer

Alexander-Volk May 7, 2024
Author

bhargav2427
Sep 21, 2024