Fix snapshot automount expiry cancellation deadlock #17941
+71
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
This fixes a deadlock that occurs when snapshot expiry tasks are cancelled while locks are held. The deadlock causes the system to hang with multiple threads blocked indefinitely, requiring a system restart. The issue manifests under heavy snapshot automount load combined with memory pressure triggering ARC pruning.
Description
The deadlock occurs when the snapshot expiry task, ARC memory reclamation, and lock acquisition form a circular dependency. The sequence is:
snapentry_expiretask spawns an umount process viacall_usermodehelper()and waits for completionarc_prunewhich acquires locks (z_teardown_lock)arc_prunecallszfs_exit_fs()→zfsctl_snapshot_unmount_delay()to reschedule snapshot expiryzfsctl_snapshot_unmount_delay()attempts to cancel the running expiry task withtaskq_cancel_id()taskq_cancel_id()blocks waiting for the task to complete (holds lock while waiting)arc_pruneThe fix adds a boolean
waitparameter totaskq_cancel_id():wait=B_TRUE: Block until task completes (default behavior for all callers)wait=B_FALSE: Return EBUSY immediately if task is running (non-blocking)The
zfs_exit_fs()path now uses non-blocking cancellation (wait=B_FALSE), breaking the deadlock by returning immediately when the expiry task is already running. Additional changes include removing the per-entryse_taskqid_lock(all taskqid operations now use globalzfs_snapshot_lockas WRITER), and adding anse_in_umountflag to prevent recursive waits whenzfsctl_destroy()is called during unmount.Hung Task Stack Trace:
How Has This Been Tested?
Reproduction script:
Results:
Types of changes
Checklist:
Signed-off-by.