-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multipath-tools 0.10.1 and 0.10.2 #17
Conversation
The @v1 releases are deprecated. https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/ Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
We can't use "container:" any more because upload-artifact@v4 doesn't work on Debian Jessie. Signed-off-by: Martin Wilck <[email protected]>
Running "make" in a container and "make clean" outside doesn't work (access right issues). So just use separate jobs. Signed-off-by: Martin Wilck <[email protected]>
Avoid "text file busy" error on GitHub. dmevents-test: Text file busy Makefile:74: recipe for target 'dmevents.out' failed Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
The ABI should never change on a stable branch. This workflow asserts that. Signed-off-by: Martin Wilck <[email protected]>
dm_get_maps() traverses the entire list of dm maps. We shouldn't give up just because probing a single map failed. Fixes: bf3a4ad ("libmultipath: simplify dm_get_maps()") Signed-off-by: Martin Wilck <[email protected]>
We import DM_COLDPLUG_SUSPENDED in all code flows below mpath_coldplug_end. Clarify this in the code. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Since b22c273 ("11-dm-mpath.rules: Don't force activation while device is suspended"), we've handled the case where a device is suspended while an uevent is processed (e.g. because multipathd is reloading the map again at the same time). But we were missing the case where The device had never been initialized before. If .MPATH_DEVICE_READY_OLD was empty, we'd jump to scan_import without setting MPATH_DEVICE_READY to 0. This can cause a device not to be fully activated at boot time, because in follow-up uevents we'd assume that the device had already been set up. Treat the case in which an uevent is processed for a previously not fully set-up, suspended device like other situations where we set MPATH_DEVICE_READY to 0. Fixes: b22c273 ("11-dm-mpath.rules: Don't force activation while device is suspended") Reviewed-by: Benjamin Marzinski <[email protected]>
Our code is always setting MPATH_UNCHANGED and DM_ACTIVATION in pairs. While DM_ACTIVATION is a global DM property, MPATH_UNCHANGED is owned by us. Just set MPATH_UNCHANGED, and adapt DM_ACTIVATION when necessary just in one place. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
We set .DM_NOSCAN above where we set DM_UDEV_DISABLE_OTHER_RULES_FLAG, too. If the latter isn't set, .DM_NOSCAN can't be set. Skip the redundant test. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
When multipath reloads a device or fails or restores a path, the udev rules disable LVM scanning, but since .DM_NOSCAN isn't set, blkid is still run on the device. When multipath devices that are set to queue_if_no_path lose all their paths at close to the same time, udev workers can hang trying to run blkid. The blkid results shouldn't change when multipathd is adding, removing, failing or reinstating paths, aside from avoiding hanging udev processes, we're skipping unnecessary work. Hence, set .DM_NOSCAN if MPATH_UNCHANGED is set, to avoid blkid from being called in 13-dm.rules. Suggested-by: Benjamin Marzinski <[email protected]> Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
If pp->dev_loss is DEV_LOSS_TMO_UNSET and min_dev_loss is 0 (which is the case if no_path_retry is NO_PATH_RETRY_FAIL or NO_PATH_RETRY_UNDEF), we will set pp->dev_loss to 0, which is wrong. Fix it. Fixes: 058b5f5 ("libmultipath: fix dev_loss_tmo even if not set in configuration") Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
If reload_and_sync_map() removes the multipath device, deferred_failback_tick() needs to decrement the counter so that it doesn't skip the following device. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Reported by coverity: "i--" may cause an underflow, which will again cause an overflow when the loop continues. Use a signed int for loops like this to make coverity happy. Signed-off-by: Martin Wilck <[email protected]>
Avoid the following error with clang 19: msort.c:268:27: error: cast from '__compar_fn_t' (aka 'int (*)(const void *, const void *)') to '__compar_d_fn_t' (aka 'int (*)(const void *, const void *, void *)') converts to incompatible function type [-Werror,-Wcast-function-type-mismatch] 268 | return msort_r (b, n, s, (__compar_d_fn_t)cmp, NULL); | ^~~~~~~~~~~~~~~~~~~~ Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Looks good to me. |
Actually, I do have some thoughts. First, I'm not sure we need to update News.md for the stable branch releases. You're certainly welcome to, and I can see how it would save distro maintainers some time if they wanted an overview of the changes. But I think simply bumping the version number for each pull request should be all that's really necessary, if you want to skip some extra work. Also, looking at this release makes it obvious that it's not always going to be easy to come up with a good "Fixes" commit trailer for some of the fixes. Without these, it won't be as easy as I hoped for people to be able to check if fix is applicable to a previous stable branch. Although I expect that if a fix is only applicable in the latest stable branch, then it should be pretty easy to come up |
Updating NEWS.md is a minor effort, because I just copy/paste the relevant parts from the major release. The fact that I did it here doesn't mean I am committing myself to do it always in the future, but I think it wouldn't cost me much time. About the "Fixes" trailers, I am not sure I understand your remark. Did you expect me to add "Fixes:" trailers to the bug fix commits that I'd identified? I opted to cherry-pick the 0.11.0 commits unchanged. IMO, if I add "Fixes:" here, I should do it in 0.11.0 as well (meaning that I need to rebase my "queue" branch, but I guess I'll have to do that anyway when I include the currently pending fixes). In the future, we should agree to add "Fixes:" trailers to bug fix patches sent to dm-devel in the first place, but we haven't done so in the past (at least these trailers haven't been mandatory, we've been adding them rather sporadically). Or were you referring to the other commits (like the github workflow commits) that don't fix actual bugs at all? |
No. I was expecting just what you did. We shouldn't be unnecessarily mucking with commit messages for the stable branch. I was just noticing that a number of the actual bug fixes didn't include "Fixes" trailers, and for some of them it's not even obvious what you would put there. They are (or at least could be) fixing issues that were introduced in multiple commits or issues that have existed long enough that the code has been refactored multiple times since the bug was introduced. The code may have originally been correct, and then you would need to track down the code change that caused it to stop being the correct thing to do. I was hoping that downstream users who wanted to stick with a stable branch that we are no longer maintaining could just look through all of subsequent stable branches and pull out the fixes applicable to their branch. The easiest way to do that would be to look at the Fixes trailers, and see if their branch includes the referenced commit. But like I said, it's not always obvious what you would put there. I agree that we should probably make a better effort to label bugfixes with a Fixes trailer when they are initially committed, but it can probably be a best effort kind of thing. If the bug has existed for years and the code has been tweaked and refactored multiple times since the initial bug was introduced, I don't think we should require git spelunking to find the initial breaking commit. But in that case, the bug has been around for years so finding the initial commit isn't that important. I suppose we could helpfully add something
To indicate that the bug was been there since at least 0.6.4. For commits that fix bugs introduced in multiple commits, we can probably just get by with a best effort to find the oldest breaking commit and mention that there are others.
Nope. Things like those don't need any trailers. |
@bmarzins discussion about future improvements and fixes tags aside, am I assuming correctly that you would ack this PR if I made it on opensvc?
Note that since 0.10.0, I have tried to point out in NEWS.md in which previous version a given bug had been introduced. I did not put the commit IDs there though because NEWS.md should also be readable by non-technical users. I haven't spent this effort for earlier releases, but still the "Bug fixes" section in NEWS.md should help downstream users. |
Yeah. My thoughts were just thoughts, not objections to the PR. |
c287753
to
f23bced
Compare
I've pushed the bugfix part of my systemd watchdog patch. I hope this is fine for you. The 0.11.0 solution is not suitable for the stable branch IMO. |
WATCHDOG_USEC may be set to 0, which means that the watchdog is disabled in systemd. Fixes: 9366cfb ("multipathd: Implement systemd watchdog integration") Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
f23bced
to
c435868
Compare
On systems with LVM volumes, "multipath -ll" will spit out lots of messages like libmp_mapinfo: map vg-lv0 has multiple targets which is irritating. Reduce the log level of these messages to 3, as they are harmless most of the time. This is a backport of e8949c2 ("libmultipath: reduce log level of libmp_mapinfo() messages") from the master branch. We can't apply exactly the same fix because the stable branch is missing 8c772d3 ("libmultipath: check DM UUID earlier in libmp_mapinfo__"). Signed-off-by: Martin Wilck <[email protected]>
de4efb7
to
b2642d2
Compare
_find_controllers() needs to free the udev device if it doesn't get added to a path. Otherwise it can leak memory whenever check_foreign() is called, causing multipathd's memory usage to continually grow. Fixes: 7b47762 ("libmultipath: nvme: fix path detection for kernel 4.16") Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]> (cherry picked from commit 5a3d334)
In function io_err_stat_handle_pathfail(), path->io_err_dis_reinstate_time is set to 0 to enqueue path to io error check as soon as possible. But multipathd can not do it within marginal_path_err_recheck_gap_time seconds after power-on, because curr_time is less than marginal_path_err_recheck_gap_time. To handle the early marginal path, we can enqueue path when io_err_dis_reinstate_time is 0. Signed-off-by: chenrenhui <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]> > (cherry picked from commit a1e3cf2)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
pp->pgindex is set in disassemble_map() when a map is parsed. There are various possiblities for this index to become invalid. pp->pgindex is only used in enable_group() and followover_should_fallback(), and both callers take no action if it is 0, which is the right thing to do if we don't know the path's pathgroup. Make sure pp->pgindex is reset to 0 in various places: - when it's orphaned, - before (re)grouping paths, - when we detect a bad mpp assignment in update_pathvec_from_dm(). - when a pathgroup is deleted in update_pathvec_from_dm(). In this case, pgindex needs to be invalidated for all paths in all pathgroups after the one that was deleted. The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but because we're looping over pg->paths in the former and over pg->pgp in the latter, I think it's better too play safe. Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up") Fixes: opensvc#105 Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]> (cherry picked from commit cd912cf) (cherry picked from commit 714c20b)
The previous algorithm didn't detect the case case where cpgp contained a path that was not contained in pgp. Fix this. Cherry-picked from d4b35f6 Fixes: 90773ba ("libmultipath: resolve hash collisions in pgcmp()") Signed-off-by: Martin Wilck <[email protected]>
If map creation succeeds, previously not multipathed devices are now multipathed. udev may not have noticed this yet, thus trigger path uevents to make it aware of the situation. Likewise, if creating a map fails, the paths in question were likely considered multipath members by udev, too. They will now be marked as failed, so trigger an event in this situation as well. Fixes: opensvc#103 Suggested-by: Benjamin Marzinski <[email protected]> Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]> (cherry picked from commit 98b3a7b)
... if a map has been flushed. In this case, we know that the the paths haven't been multipathed by coalesce_paths() because of the current configuration (failure to create the map can't be the reason if the map exists in coalesce_maps()). Make sure udev sees the paths which have been released from the map as non-multipath. Note that this is the only case where maps are flushed where it is correct to trigger paths uevents. In other cases, e.g. after a "remove map" CLI command, the configuration is unchanged and if we triggered an uevent, the map would be re-created by multipathd when the uevent arrived. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]> (cherry picked from commit ad3ea47)
If a multipath device already has need_reload set when a path is adopted, it won't call set_path_max_sectors_kb() because of short-circuit evaluation. This isn't what's intended. Fixes: e5e20c7 ("libmultipath: set max_sectors_kb in adopt_paths()") Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]> (cherry picked from commit f5c0c4b)
This change doesn't actually fix anything. The code was already safe. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]> (cherry picked from commit 85ec51e)
multipath wouldn't autodetect the GROUP_BY_PRIO path grouping policy or allow the GROUP_BY_TPG policy if there was a path that didn't have its prioritizer selected (for instance because multipathd was reconfigured while it was offline). To avoid this, make verify_alua_prio() assume an alua multipath device if all the paths with a prioritizer selected (there must be at least one) use an alua-based prioritizer. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]> (cherry picked from commit b47a577)
This PR is now broken because I've updated the master branch to the upstream master branch. It should be OK once I convert it into a PR to opensvc/stable-0.10.y, though. |
Signed-off-by: Martin Wilck <[email protected]>
This is now opensvc#109 |
multipath-tools 0.10.2, 2025/02
This release contains backported bug fixes from the stable-0.11.y branch.
Bug fixes
if an invalid path device was removed from a map.
Fixes #105.
inconsistent or wrong kernel state (e.g. missing or falsely mapped path
device). Wrongly mapped paths will be unmapped and released to the system.
Fixes another issue reported in
#105.
group_by_tpg
might be disabled if one or morepaths were offline during initial configuration.
max_sectors_kb
might not be set on a path device thatwas being added to a multipath map. This problem was introduced in 0.9.9.
multipath-tools 0.10.1, 2025/01
This is the first bug fix release on the
stable-0.10.y
branch. It containsbug fixes from 0.11.0, and some CI-related fixes.
Bug fixes
of device mapper devices, in particular devices with multiple DM targets.
The problem was introduced in 0.10.0.
Fixes #102.
activated during boot if a cold plug uevent is processed for a previously
not configured multipath map while this map was suspended. This problem existed
since 0.9.8.
no_path_retry fail
and no settingfor
dev_loss_tmo
might get thedev_loss_tmo
set to 0, causing thedevice to be deleted immediately in the event of a transport disruption.
This bug was introduced in 0.9.6.
(
failback
value > 0 inmultipath.conf
), some maps might fail back laterthan configured. The problem existed since 0.9.6.
WATCHDOG_USEC
environment variable had the value "0", which means that thewatchdog is simply disabled. This (minor) problem existed since 0.4.9.
0.7.8.
the io error check for a recently failed path to be delayed. This bug
existed since 0.7.4.