Skip to content
This repository was archived by the owner on Aug 29, 2023. It is now read-only.
This repository was archived by the owner on Aug 29, 2023. It is now read-only.

Rolling restart unable to restart broker #239

@Dwijad

Description

@Dwijad

Hi
I am trying use rolling restart script(latest) along with Jolokia (jolokia-jvm-1.6.2-agent.jar) which is embedded with the kafka service script running in the brokers node(passed via KAFKA_OPTS).

KAFKA_OPTS="-javaagent:/home/kafka/prometheus/jmx_prometheus_javaagent-0.3.1.jar=8080:/home/kafka/prometheus/kafka-0-8-2.yml -javaagent:/home/kafka/jolokia/jolokia-agent.jar=host=*"

I am able to get jolokia metrics from the remote brokers node using following CURL command.

curl bro1:8778/jolokia/read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value | jq

When i run the rolling restart script, it detects all the brokers and after confirmation the script stops the first broker. Then it waits forever to broker 1 to restart with the following messages:

[kfk@admin-node ~]$ kafka-rolling-restart --cluster-type kafka --start-command "/home/kfk/bin/kafka-server-start -daemon /home/kfk/etc/kafka/server.properties " --stop-command "/home/kfk/bin/kafka-server-stop" --check-count 3
Will restart the following brokers in cluster-1:
  1: bro1
  2: bro2
  3: bro3
Do you want to restart these brokers? y
Execute restart
Under replicated partitions: 0, missing brokers: 0 (1/1)
The cluster is stable
Stopping bro1 (1/3)
Starting bro1 (1/3)
Cannot find the key, Kafka is probably still starting up
Under replicated partitions: 40, missing brokers: 1 (0/3)
Broker bro1 is down: HTTPConnectionPool(host='bro1', port=8778): Max retries exceeded with url: /jolokia//read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6de814bb50>: Failed to establish a new connection: [Errno 111] Connection refused',)).This maybe because it is starting up
Under replicated partitions: 68, missing brokers: 1 (0/3)
Broker bro1 is down: HTTPConnectionPool(host='bro1', port=8778): Max retries exceeded with url: /jolokia//read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6de80d8890>: Failed to establish a new connection: [Errno 111] Connection refused',)).This maybe because it is starting up
Under replicated partitions: 68, missing brokers: 1 (0/3)
Broker bro1 is down: HTTPConnectionPool(host='bro1', port=8778): Max retries exceeded with url: /jolokia//read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6de80d8a50>: Failed to establish a new connection: [Errno 111] Connection refused',)).This maybe because it is starting up
Under replicated partitions: 68, missing brokers: 1 (0/3)
Broker bro1 is down: HTTPConnectionPool(host='bro1', port=8778): Max retries exceeded with url: /jolokia//read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6de80e9810>: Failed to establish a new connection: [Errno 111] Connection refused',)).This maybe because it is starting up
Under replicated partitions: 68, missing brokers: 1 (0/3)
Broker bro1 is down: HTTPConnectionPool(host='bro1', port=8778): Max retries exceeded with url: /jolokia//read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6de8974b90>: Failed to establish a new connection: [Errno 111] Connection refused',)).This maybe because it is starting up

Tried with the following command as well:

[kfk@admin-node ~]$ kafka-rolling-restart --cluster-type kafka --start-command "sudo service kafka start " --stop-command "sudo service kafka stop" --check-count 3

On inspecting brokers node 1, I found kafka is stopped. Upon manual restart of broker 1, the rolling restart script stopped the second broker and again the script waits forever for broker 2 to get up. I have tested all the service command for kafka(start,stop,restart) manually in the broker's node and all of them are working.

It looks rolling restart script able to stop the kafka broker but unable to restart it.
Where could be the issues ?

Kafka version: confluent-5.2.1-2.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions