Skip to content

Conversation

@lge
Copy link
Member

@lge lge commented Dec 3, 2025

344beb1 introduced status_check=rule (and set that as default).

This breaks "monitor" and "probe" of "action=block" for existing setups.

As soon as the "unblock" is "started", the rule is no longer active. With status_check=rule (the new default), the next "monitor" (or "probe") will see an unexpected state of the "block" instance, and recover the whole dependency chain. With monitor enabled, this will repeat forever.

Not a fix, even if you think it might be, is to drop the monitor. Because the same "unexpected" state would be detected on any probe.

Probe will happen after you go into and then out of maintenance-mode, after a DC election, for example after cluster partition / rejoin / or other membership change like a peer reboot, even a planned one.
Possibly for other reasons as well.

Any such probe will find the resource in an unexpected state, and "recover" by spuriously restarting the whole dependency chain.

Workaround: explicitly set status_check=pseudo (or any other string not equal to "rule").

Still, without it, existing setups (that don't know about the status_check parameter yet) are broken by a simple update of the portblock resource agent.

Possible "fix": don't set "rule" as the default.

Better fix: always treat "action=block" instances as pseudo resources.

That's basically the same you do for promotable instances, only that the equivalent to the "pseudo" status file
is now the promotion score as found in the cib.

Fixes #2099

344beb1 introduced status_check=rule (and set that as default).

This breaks "monitor" and "probe" of "action=block" for existing setups.

As soon as the "unblock" is "started", the rule is no longer active.
With status_check=rule (the new default), the next "monitor" (or "probe") will
see an unexpected state of the "block" instance, and recover the whole
dependency chain. With monitor enabled, this will repeat forever.

Not a fix, even if you think it might be, is to drop the monitor.
Because the same "unexpected" state would be detected on any probe.

Probe will happen after you go into and then out of maintenance-mode, after a
DC election, for example after cluster partition / rejoin / or other membership
change like a peer reboot, even a planned one.
Possibly for other reasons as well.

Any such probe will find the resource in an unexpected state, and "recover"
by spuriously restarting the whole dependency chain.

Workaround: explicitly set status_check=pseudo (or any other string not equal to "rule").

Still, without it, existing setups (that don't know about the status_check parameter yet)
are broken by a simple update of the portblock resource agent.

Possible "fix": don't set "rule" as the default.

Better fix: always treat "action=block" instances as pseudo resources.

That's basically the same you do for promotable instances, only that
the equivalent to the "pseudo" status file
is now the promotion score as found in the cib.

Fixes ClusterLabs#2099
@lge lge requested a review from oalbrigt December 3, 2025 13:04
@oalbrigt
Copy link
Contributor

oalbrigt commented Dec 3, 2025

Not monitoring the block rule is not a secure default for a firewall agent, so I dont think we should do this.

It makes more sense for users to set status_check=pseudo for when they use the agent in this kind of scenario.

@lge
Copy link
Member Author

lge commented Dec 3, 2025

superseeded with #2108

@lge lge closed this Dec 3, 2025
@lge lge deleted the portblock-block-default-to-status-check-pseudo branch December 4, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression with recent portblock status_check=rule

2 participants