Skip to content

dsl_dir: avoid dd_lock during snapshots_changed updates#18472

Open
Gality369 wants to merge 1 commit intoopenzfs:masterfrom
Gality369:avoid-dd_lock-during-snapshots_changed-updates
Open

dsl_dir: avoid dd_lock during snapshots_changed updates#18472
Gality369 wants to merge 1 commit intoopenzfs:masterfrom
Gality369:avoid-dd_lock-during-snapshots_changed-updates

Conversation

@Gality369
Copy link
Copy Markdown
Contributor

@Gality369 Gality369 commented Apr 28, 2026

Motivation and Context

A tmp snapshot update still holds dd_lock while persisting the
snapshots_changed ZAP entry. Both dsl_dir_zapify() and zap_update() can dirty
buffers and recurse into dsl_dir_willuse_space(), which preserves the
dd_lock -> zap_rwlock lock ordering that lockdep reports.

WARNING: possible circular locking dependency detected
------------------------------------------------------
syz.0.2/479 is trying to acquire lock:
ffff8880178f41f0 (&dd->dd_lock){+.+.}-{4:4}, at: dsl_dir_willuse_space+0xec/0x660 fs/zfs/zfs/dsl_dir.c:1519

but task is already holding lock:
ffff8880185afda8 (&l->l_rwlock){++++}-{4:4}, at: zap_get_leaf_byblk+0x20a/0xca0 fs/zfs/zfs/zap.c:569

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&l->l_rwlock){++++}-{4:4}:
       down_write+0x8f/0x200 kernel/locking/rwsem.c:1590
       zap_open_leaf fs/zfs/zfs/zap.c:496 [inline]
       zap_get_leaf_byblk+0x51b/0xca0 fs/zfs/zfs/zap.c:567
       zap_deref_leaf+0x22f/0x280 fs/zfs/zfs/zap.c:701
       fzap_lookup+0x250/0x510 fs/zfs/zfs/zap.c:891
       zap_lookup_impl+0x122/0x600 fs/zfs/zfs/zap_micro.c:1147
       zap_lookup_norm fs/zfs/zfs/zap_micro.c:1189 [inline]
       zap_lookup+0xeb/0x160 fs/zfs/zfs/zap_micro.c:1130
       spa_dir_prop fs/zfs/zfs/spa.c:3157 [inline]
       spa_ld_trusted_config+0x12b/0x1310 fs/zfs/zfs/spa.c:4913
       spa_ld_mos_with_trusted_config.part.0+0x2d/0x1b0 fs/zfs/zfs/spa.c:5935
       spa_ld_mos_with_trusted_config fs/zfs/zfs/spa.c:3691 [inline]
       spa_load_impl fs/zfs/zfs/spa.c:5982 [inline]
       spa_load+0x42c/0x4580 fs/zfs/zfs/spa.c:3666
       spa_tryimport+0x396/0x9a0 fs/zfs/zfs/spa.c:7574
       zfs_ioc_pool_tryimport+0x14a/0x1f0 fs/zfs/zfs/zfs_ioctl.c:1670
       zfsdev_ioctl_common+0x128c/0x1600 fs/zfs/zfs/zfs_ioctl.c:8239
       zfsdev_ioctl+0x65/0x120 fs/zfs/os/linux/zfs/zfs_ioctl_os.c:145
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:597 [inline]
       __se_sys_ioctl fs/ioctl.c:583 [inline]
       __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583
       x64_sys_call+0x1144/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:17
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&zap->zap_rwlock){++++}-{4:4}:
       down_write+0x8f/0x200 kernel/locking/rwsem.c:1590
       mzap_open fs/zfs/zfs/zap_micro.c:530 [inline]
       zap_lockdir_impl+0xb98/0x2b10 fs/zfs/zfs/zap_micro.c:636
       zap_lockdir+0x183/0x1e0 fs/zfs/zfs/zap_micro.c:756
       zap_update+0xdd/0x580 fs/zfs/zfs/zap_micro.c:1632
       dsl_dir_snap_cmtime_update+0x412/0x610 fs/zfs/zfs/dsl_dir.c:2280
       dsl_dataset_snapshot_sync_impl+0x1180/0x1f30 fs/zfs/zfs/dsl_dataset.c:1899
       dsl_dataset_snapshot_tmp_sync+0x104/0x340 fs/zfs/zfs/dsl_dataset.c:2068
       dsl_sync_task_sync+0x24c/0x3f0 fs/zfs/zfs/dsl_synctask.c:256
       dsl_pool_sync+0xadc/0x14f0 fs/zfs/zfs/dsl_pool.c:853
       spa_sync_iterate_to_convergence fs/zfs/zfs/spa.c:10645 [inline]
       spa_sync+0x9bb/0x27a0 fs/zfs/zfs/spa.c:10896
       txg_sync_thread+0x659/0x1290 fs/zfs/zfs/txg.c:602
       thread_generic_wrapper+0x1c8/0x2a0 fs/zfs/os/linux/spl/spl-thread.c:63
       kthread+0x3f0/0x850 kernel/kthread.c:463
       ret_from_fork+0x50f/0x610 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #0 (&dd->dd_lock){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x14ae/0x21e0 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x169/0x2f0 kernel/locking/lockdep.c:5825
       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
       __mutex_lock+0x1a6/0x1d80 kernel/locking/mutex.c:760
       mutex_lock_nested+0x1b/0x30 kernel/locking/mutex.c:812
       dsl_dir_willuse_space+0xec/0x660 fs/zfs/zfs/dsl_dir.c:1519
       dmu_objset_willuse_space+0xa4/0x100 fs/zfs/zfs/dmu_objset.c:3043
       dbuf_dirty+0x3f2/0x3cc0 fs/zfs/zfs/dbuf.c:2333
       dmu_buf_will_dirty_flags+0x25e/0xcc0 fs/zfs/zfs/dbuf.c:2694
       dmu_buf_will_dirty+0x27/0x40 fs/zfs/zfs/dbuf.c:2700
       zap_get_leaf_byblk+0x294/0xca0 fs/zfs/zfs/zap.c:575
       zap_deref_leaf+0x22f/0x280 fs/zfs/zfs/zap.c:701
       fzap_update+0x268/0x4f0 fs/zfs/zfs/zap.c:992
       zap_update+0x30e/0x580 fs/zfs/zfs/zap_micro.c:1641
       sa_add_layout_entry+0x302/0x840 fs/zfs/zfs/sa.c:435
       sa_find_layout+0x394/0x7b0 fs/zfs/zfs/sa.c:485
       sa_build_layouts+0xd6c/0x19f0 fs/zfs/zfs/sa.c:774
       sa_replace_all_by_template_locked fs/zfs/zfs/sa.c:1893 [inline]
       sa_replace_all_by_template+0x18e/0x640 fs/zfs/zfs/sa.c:1903
       zfs_mknode+0x162f/0x3e20 fs/zfs/os/linux/zfs/zfs_znode_os.c:905
       zfs_create+0xfa2/0x17b0 fs/zfs/os/linux/zfs/zfs_vnops_os.c:757
       zpl_create+0x29a/0x540 fs/zfs/os/linux/zfs/zpl_inode.c:196
       lookup_open.isra.0+0x10a1/0x1460 fs/namei.c:3796
       open_last_lookups fs/namei.c:3895 [inline]
       path_openat+0x11fe/0x2ce0 fs/namei.c:4131
       do_filp_open+0x1f6/0x430 fs/namei.c:4161
       do_sys_openat2+0x117/0x1c0 fs/open.c:1437
       do_sys_open fs/open.c:1452 [inline]
       __do_sys_openat fs/open.c:1468 [inline]
       __se_sys_openat fs/open.c:1463 [inline]
       __x64_sys_openat+0x15b/0x220 fs/open.c:1463
       x64_sys_call+0x161b/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:258
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

Chain exists of:
  &dd->dd_lock --> &zap->zap_rwlock --> &l->l_rwlock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&l->l_rwlock);
                               lock(&zap->zap_rwlock);
                               lock(&l->l_rwlock);
  lock(&dd->dd_lock);

 *** DEADLOCK ***

6 locks held by syz.0.2/479:
 #0: ffff88801783c420 (sb_writers#12){.+.+}-{0:0}, at: open_last_lookups fs/namei.c:3884 [inline]
 #0: ffff88801783c420 (sb_writers#12){.+.+}-{0:0}, at: path_openat+0x1ed8/0x2ce0 fs/namei.c:4131
 #1: ffff88801ea71108 (&type->i_mutex_dir_key#7){+.+.}-{4:4}, at: inode_lock include/linux/fs.h:980 [inline]
 #1: ffff88801ea71108 (&type->i_mutex_dir_key#7){+.+.}-{4:4}, at: open_last_lookups fs/namei.c:3892 [inline]
 #1: ffff88801ea71108 (&type->i_mutex_dir_key#7){+.+.}-{4:4}, at: path_openat+0x1186/0x2ce0 fs/namei.c:4131
 #2: ffff888011684980 (&zh->zh_lock){+.+.}-{4:4}, at: zfs_znode_hold_enter+0x51d/0x950 fs/zfs/os/linux/zfs/zfs_znode_os.c:291
 #3: ffff88801eab6038 (&hdl->sa_lock){+.+.}-{4:4}, at: sa_replace_all_by_template+0x8d/0x640 fs/zfs/zfs/sa.c:1902
 #4: ffff8880153371c8 (&zap->zap_rwlock){++++}-{4:4}, at: zap_lockdir_impl+0x58f/0x2b10 fs/zfs/zfs/zap_micro.c:654
 #5: ffff8880185afda8 (&l->l_rwlock){++++}-{4:4}, at: zap_get_leaf_byblk+0x20a/0xca0 fs/zfs/zfs/zap.c:569

stack backtrace:
Call Trace:
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120
 dump_stack+0x15/0x20 lib/dump_stack.c:129
 print_circular_bug+0x285/0x360 kernel/locking/lockdep.c:2043
 check_noncircular+0x14e/0x170 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x14ae/0x21e0 kernel/locking/lockdep.c:5237
 lock_acquire kernel/locking/lockdep.c:5868 [inline]
 lock_acquire+0x169/0x2f0 kernel/locking/lockdep.c:5825
 __mutex_lock_common kernel/locking/mutex.c:598 [inline]
 __mutex_lock+0x1a6/0x1d80 kernel/locking/mutex.c:760
 mutex_lock_nested+0x1b/0x30 kernel/locking/mutex.c:812
 dsl_dir_willuse_space+0xec/0x660 fs/zfs/zfs/dsl_dir.c:1519
 dmu_objset_willuse_space+0xa4/0x100 fs/zfs/zfs/dmu_objset.c:3043
 dbuf_dirty+0x3f2/0x3cc0 fs/zfs/zfs/dbuf.c:2333
 dmu_buf_will_dirty_flags+0x25e/0xcc0 fs/zfs/zfs/dbuf.c:2694
 dmu_buf_will_dirty+0x27/0x40 fs/zfs/zfs/dbuf.c:2700
 zap_get_leaf_byblk+0x294/0xca0 fs/zfs/zfs/zap.c:575
 zap_deref_leaf+0x22f/0x280 fs/zfs/zfs/zap.c:701
 fzap_update+0x268/0x4f0 fs/zfs/zfs/zap.c:992
 zap_update+0x30e/0x580 fs/zfs/zfs/zap_micro.c:1641
 sa_add_layout_entry+0x302/0x840 fs/zfs/zfs/sa.c:435
 sa_find_layout+0x394/0x7b0 fs/zfs/zfs/sa.c:485
 sa_build_layouts+0xd6c/0x19f0 fs/zfs/zfs/sa.c:774
 sa_replace_all_by_template_locked fs/zfs/zfs/sa.c:1893 [inline]
 sa_replace_all_by_template+0x18e/0x640 fs/zfs/zfs/sa.c:1903
 zfs_mknode+0x162f/0x3e20 fs/zfs/os/linux/zfs/zfs_znode_os.c:905
 zfs_create+0xfa2/0x17b0 fs/zfs/os/linux/zfs/zfs_vnops_os.c:757
 zpl_create+0x29a/0x540 fs/zfs/os/linux/zfs/zpl_inode.c:196
 lookup_open.isra.0+0x10a1/0x1460 fs/namei.c:3796
 open_last_lookups fs/namei.c:3895 [inline]
 path_openat+0x11fe/0x2ce0 fs/namei.c:4131
 do_filp_open+0x1f6/0x430 fs/namei.c:4161
 do_sys_openat2+0x117/0x1c0 fs/open.c:1437
 do_sys_open fs/open.c:1452 [inline]
 __do_sys_openat fs/open.c:1468 [inline]
 __se_sys_openat fs/open.c:1463 [inline]
 __x64_sys_openat+0x15b/0x220 fs/open.c:1463
 x64_sys_call+0x161b/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:258
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Description

This change keeps the in-memory dd_snap_cmtime update under dd_lock, but
moves both dsl_dir_zapify() and the DD_FIELD_SNAPSHOTS_CHANGED zap_update()
out from under that lock.

That removes the dd_lock -> zap_rwlock edge involved in the reported
lock-order inversion.

The final version does not use a retry loop. After re-checking the callers,
this path runs in syncing context and the relevant updates are serialized, so
there should not be a concurrent updater for the same dataset that can cause
the on-disk snapshots_changed value to move backwards.

How Has This Been Tested?

Tested on linux.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)

Checklist:

Copilot AI review requested due to automatic review settings April 28, 2026 02:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a lock-order inversion involving dd->dd_lock and ZAP updates during snapshot “snapshots_changed” timestamp persistence by moving the zap_update() call out from under dd_lock while preserving correctness of the on-disk value.

Changes:

  • Stop holding dd->dd_lock across zap_update() when persisting DD_FIELD_SNAPSHOTS_CHANGED.
  • Keep the in-memory dd_snap_cmtime update and dsl_dir_zapify() under dd_lock, but perform the ZAP write after unlocking.
  • Add a retry loop to re-check dd_snap_cmtime after the on-disk update and re-write if it changed during the write.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread module/zfs/dsl_dir.c Outdated
@Gality369 Gality369 force-pushed the avoid-dd_lock-during-snapshots_changed-updates branch from 2911d64 to 297e920 Compare April 29, 2026 04:19
Comment thread module/zfs/dsl_dir.c Outdated
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 5, 2026
Avoid holding dd_lock while updating the on-disk
snapshots_changed timestamp.

Both dsl_dir_zapify() and zap_update() may dirty buffers
and recurse into space accounting, which can take dd_lock.
Holding dd_lock across either operation can therefore
preserve the lock-order inversion reported by lockdep.

Only protect the in-memory dd_snap_cmtime update
with dd_lock. Perform the zapify and ZAP update without
dd_lock held, and retry the on-disk write if another updater
advanced dd_snap_cmtime while the write was in progress.

Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Copilot AI review requested due to automatic review settings May 6, 2026 12:51
@Gality369 Gality369 force-pushed the avoid-dd_lock-during-snapshots_changed-updates branch from 297e920 to ca51bc1 Compare May 6, 2026 12:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread module/zfs/dsl_dir.c
@behlendorf behlendorf requested a review from amotin May 6, 2026 16:22
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 7, 2026
Copy link
Copy Markdown
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally unwrap the lines now that indentation is reduced, but whatever.

May be we could add ASSERT(dsl_pool_sync_context(dp)) here to make it obvious?

Comment thread module/zfs/dsl_dir.c
mutex_exit(&dd->dd_lock);

mos = dd->dd_pool->dp_meta_objset;
ddobj = dd->dd_object;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Declaring these variables where they're assigned would be nice too. In fact, there's probably no need for ddobj if you unwrap the lines below, dd->dd_object can be used directly. I would keep mos though since it makes it clear zap_update() is operating on the mos object.

objset_t *mos = dd->dd_pool->dp_meta_objset;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants