portblock: fix monitor for action=block instances #2107
Closed
+29
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
344beb1 introduced status_check=rule (and set that as default).
This breaks "monitor" and "probe" of "action=block" for existing setups.
As soon as the "unblock" is "started", the rule is no longer active. With status_check=rule (the new default), the next "monitor" (or "probe") will see an unexpected state of the "block" instance, and recover the whole dependency chain. With monitor enabled, this will repeat forever.
Not a fix, even if you think it might be, is to drop the monitor. Because the same "unexpected" state would be detected on any probe.
Probe will happen after you go into and then out of maintenance-mode, after a DC election, for example after cluster partition / rejoin / or other membership change like a peer reboot, even a planned one.
Possibly for other reasons as well.
Any such probe will find the resource in an unexpected state, and "recover" by spuriously restarting the whole dependency chain.
Workaround: explicitly set status_check=pseudo (or any other string not equal to "rule").
Still, without it, existing setups (that don't know about the status_check parameter yet) are broken by a simple update of the portblock resource agent.
Possible "fix": don't set "rule" as the default.
Better fix: always treat "action=block" instances as pseudo resources.
That's basically the same you do for promotable instances, only that the equivalent to the "pseudo" status file
is now the promotion score as found in the cib.
Fixes #2099