-
Notifications
You must be signed in to change notification settings - Fork 263
Open
Labels
Description
There's a state corruption bug in CNS when:
- The apiserver is overloaded so object garbage collection is delayed
- A node is deleted and replaced with one with the same name
- The CNI mode is Overlay, and the new Node gets the same secondary IP prefix as the old Node
The flow is roughly:
Node A is deleted. NNC A is queued for GC.
Node A' is created. NNC A exists, so NNC A' cannot be created yet.
CNS A' starts, reads NNC A. NNC A gets GC'd, NNC A' is created with the same secondary IP prefix.
CNS tries to use the IPs from NNC A' but gets the ones from NNC A since they are the same set of IPs. This internal inconsistency causes IP assignment failure and this Node is unrecoverable.
More details
- We get a stale NNC, and load the stale NC in to state, and the stale IPs in to PendingProgramming.
- We get a new/updated NNC. The new NC has the same IP addresses. We delete the stale NC from state, but not the stale IPs, which retain an association to the stale NC.
- We load the new NC into state and try to load the new IPs. However, CNS uniquely identifies IP by their address string in Overlay so we think they are already loaded.
- The new NC is programmed, so we try to mark its IPs as available, but since we store them by address string, we accidentally mark the stale NC's IPs as available.
- During the CNI IPAM assignment call, we assign an Available IP for the Pod. But when we try to change its state to Assigned, we have to look up its NC again, and that stale IP's NC has been delete from our state <- this is where we fail and return an Assignment error.
There are several issues that stack to allow this to happen and need to be fixed:
- During init we should use node UUID/ownerref to only load NNC for the actual Node we're running on.
- When we purge a stale NC, we need to purge the IPs from that NC.
- When we see an NC is programmed and mark IPs as available, we need to check the IP's NC association instead of only their ID string, and only mark the ones for that NC.
- When we load IPs from the NNC, it's not good enough to ID them by address string. We can't fix the IP ID scheme today, but when we have an IP in state already from a different NC, we should error.