New leader election never triggered #1329

Deniskore · 2025-02-12T12:15:14Z

Hey!

I've created a simple MVP using the provided examples. I've started three nodes to act as voters, and they successfully elect a leader. However, when I terminate the leader, a new election does not occur even in 5 minutes, in my opinion this should be well documented.
My config:

let config = Arc::new(
    openraft::Config {
        heartbeat_interval: 100,
        election_timeout_min: 5000,
        election_timeout_max: 10000,
        install_snapshot_timeout: 500,
        enable_elect: true,
        enable_heartbeat: true,
        ..Default::default()
    }
    .validate()?,
);

Could you clarify on what basis should the re-election of the leader take place? @drmingdrmer
I can't use the membership change here because the leader is down and the request will be forwarded to the leader.

Expected behavior
The new leader should be elected automatically.

Actual behavior
The leader was not elected. In the logs, I see "result=Unreachable node errors," which is obvious, but I don't see any indication that the nodes are trying to initiate a leader change.

github-actions · 2025-02-12T12:15:26Z

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

drmingdrmer · 2025-02-12T13:52:51Z

If a Follower does not receive an AppendEntries message (either log entries for replication or an empty one for a heartbeat) for a while, it will begin a new election by promoting itself to a leader candidate.

The maximum time a Follower waits before starting an election is defined by Config::election_timeout_max:

openraft/openraft/src/core/raft_core.rs

Line 1559 in 7bebecb

let mut election_timeout = timer_config.election_timeout;

To investigate your issue, could you provide the DEBUG-level logs from a Follower after shutting down the Leader? These logs could reveal the root cause of the problem.

Deniskore · 2025-02-12T16:02:45Z

For some reason, it doesn't work in tests, but it works when the nodes run in Docker. Most likely, the issue is on my side.

Could you confirm or deny the statement: If I have a 3-node cluster, at least 2 nodes should be alive to reach consensus. Does this mean that if 2 nodes are dead, I will experience data loss? And if I have a 5-node cluster, should at least 3 nodes be alive?

Is it possible to switch to a 1 node cluster if other nodes fail? Does this make sense or is it better to just restart the nodes that crashed? I'm just interested in the theoretical part.

drmingdrmer · 2025-02-12T16:15:17Z

Could you confirm or deny the statement: If I have a 3-node cluster, at least 2 nodes should be alive to reach consensus. Does this mean that if 2 nodes are dead, I will experience data loss? And if I have a 5-node cluster, should at least 3 nodes be alive?

Yes.
2/3 is enough to form a live cluster and serve requests.

If 2 nodes are down, there won't be data loss (durability is maintained as soon as the 2 nodes go back online), but the cluster will be unavailable (availability loss).

If you have 5 nodes, you can tolerate 2 nodes being offline. And yes, there must be at least 3 nodes for the cluster to serve.

Is it possible to switch to a 1 node cluster if other nodes fail? Does this make sense or is it better to just restart the nodes that crashed? I'm just interested in the theoretical part.

You can switch a 3-node cluster to a 1-node cluster using Raft::change_membership(), but this requires at least 2/3 nodes to be online to commit the membership config command.

If by "crash" you mean data being erased, this is undefined behavior for a Raft cluster. However, you can restart an empty node, and the data will be replicated from the leader to the new empty node. Note, though, that in such scenarios, there is no guarantee against data loss.

Deniskore · 2025-02-14T07:27:48Z

Is it okay if I come up with a PR to improve the documentation in the code and in the FAQ? @drmingdrmer

drmingdrmer · 2025-02-14T07:33:55Z

Is it okay if I come up with a PR to improve the documentation in the code and in the FAQ? @drmingdrmer

Of cause! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New leader election never triggered #1329

New leader election never triggered #1329

Deniskore commented Feb 12, 2025

github-actions bot commented Feb 12, 2025

drmingdrmer commented Feb 12, 2025

Deniskore commented Feb 12, 2025

drmingdrmer commented Feb 12, 2025

Deniskore commented Feb 14, 2025

drmingdrmer commented Feb 14, 2025

New leader election never triggered #1329

New leader election never triggered #1329

Comments

Deniskore commented Feb 12, 2025

github-actions bot commented Feb 12, 2025

drmingdrmer commented Feb 12, 2025

Deniskore commented Feb 12, 2025

drmingdrmer commented Feb 12, 2025

Deniskore commented Feb 14, 2025

drmingdrmer commented Feb 14, 2025