-
Notifications
You must be signed in to change notification settings - Fork 1
Trusted Peers Not Connecting -- Showing Up On Restart #78
Description
Another Example Supporting a "Trusted Peers" Issue
My node has been running perfectly for most of the last month. Received the occasional "is_up=false" -- usually single instances which resolved almost immediately and would happen about 1-2x per week. Finalizing has been happening just fine. All in all, very stable. FYI - early in this round I did the occasional restart on the system and had no problems. (I am enhancing my personal node monitoring tools, hence the need for the occasional restart.) So, restarts were working fine for me previously.
Yesterday, I updated my monitoring systems and (incorrectly) introduced an issue where, when the logs rotated, my node restarted itself. When the system rotated the logs a few hours ago at midnight PDT, my node went down. (is_up=false, continually.) Multiple restarts didn't work. A reinstall didn't work. What did work was changing the config.toml, deleting the first two peers and introducing two random entries from the peers list. Made the change and system was back up in minutes. Knocked out of earnings, but system is back up and running fine.
To Reproduce
As an FYI, I confirmed that in fact was the problem by:
- Stopped my Node.
- Returned the config.toml to the original peers:
["184.154.98.116:2030", "184.154.98.117:2030", "184.154.98.118:2030", "184.154.98.119:2030", "184.154.98.120:2030"] - Restarted the node.
- Node stayed down. Here is the error logging example.
2024-04-01T03:18:54.795771531-07:00 [INFO] dagger_logger::metrics - datapoint: uptime_metrics node_id=[MY-NODE-ID] is_up=false start_ts=0 current_uptime_ms=0 uptime_added_ms=0 last_successful_sync_ts=0
2024-04-01T03:18:59.795713455-07:00 [INFO] dagger_logger::metrics - datapoint: uptime_metrics node_id=[MY-NODE-ID] is_up=false start_ts=0 current_uptime_ms=0 uptime_added_ms=0 last_successful_sync_ts=0
- Stopped node. Changed config.toml back again. System connected and is_up=true was back within seconds.
Expected behavior
Expect system to restart and connect to trusted peers as it (mostly) has).
System:
VPS, fully updated system