There are two problems here -- a bad dataset on the source, and bad behavior once the source is bad. Unfortunately I don't know how to reproduce the original bad dataset, so I'm only filing a bug about what happens after it's in an invalid state.
System information
| Type |
Version/Name |
| Distribution Name |
Debian Linux |
| Distribution Version |
13 |
| Kernel Version |
6.12.63 |
| Architecture |
amd6 |
| OpenZFS Version |
zfs-2.3.2-2 zfs-kmod-2.3.2-2 |
Describe the problem you're observing
I'm running the following:
germinate:~$ sudo zfs send -i data/snapshots.daily/tarragon@20250919 data/snapshots.daily/tarragon@20250921 >/tmp/zfs-data
kill:~$ sudo zfs receive -v data/snapshots.daily/tarragon < /tmp/zfs-data
receiving incremental stream of data/snapshots.daily/tarragon@20250921 into data/snapshots.daily/tarragon@20250921
cannot receive incremental stream: invalid backup stream
Note that the dry run version of the receive succeeds, while the non-dry version fails.
I think it should be an error for zfs send to silently produce a stream, which zfs receive rejects without detailed indication of why. Most likely, zfs send should produce a failure here. (and the dry receive version should fail as well)
zfs scrub reports no problems -- it should probably detect this problem, although I'm not clear if COW semantics prevent it being fixed?
I believe both of the above (receive behavior and scrub behavior) are bugs.
Most urgently, there is no recover path for users in this situation. I expect to get out of the situation myself with some bit-twiddling on disk, but ideally zfs scrub would report this issue, and there would be a way to zfs send this.
Describe how to reproduce the problem
The root cause of what's going on is an object with bonustype=60. I don't know how it got that way, or how to reproduce this state. Sorry! I notice the bonustype is only 1 bit off? Shouldn't it be checksummed, though?
zachary@germinate:~$ sudo zdb -dddd data/snapshots.daily/tarragon@20250919 438472
Dataset data/snapshots.daily/tarragon@20250919 [ZPL], ID 630216, cr_txg 43429282, 436G, 4503376 objects, rootbp DVA[0]=<0:3ad2e2193000:3000> DVA[1]=<0:3394b2b52000:3000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=43429238L/43429238P fill=4503376 cksum=0000001492ed98fc:000039bff6a342a6:00564ee0bd0ed707:5b16a8c385fa872a
Object lvl iblk dblk dsize dnsize lsize %full type
438472 1 128K 2K 8K 512 2K 100.00 ZFS plain file
252 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
uid 2001
gid 2001
atime Sat Aug 28 23:09:31 2021
mtime Fri Jan 10 23:52:47 2020
ctime Sat Aug 28 23:09:31 2021
crtime Sat Aug 28 23:09:31 2021
gen 20054484
mode 100644
size 1592
parent 424857
links 1
pflags 40800000004
SA xattrs: 76 bytes, 1 entries
user.rsync.%stat = 100444 0,0 1000:100
zachary@germinate:~$ sudo zdb -dddd data/snapshots.daily/tarragon@20250921 438472
Dataset data/snapshots.daily/tarragon@20250921 [ZPL], ID 382289, cr_txg 43467660, 436G, 4503427 objects, rootbp DVA[0]=<0:3af5b4c6f000:3000> DVA[1]=<0:338b69586000:3000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=43467660L/43467660P fill=4503427 cksum=00000014f695affc:00003ad94b6bad6e:0057faa1834c31af:5ce652591a8ef101
Object lvl iblk dblk dsize dnsize lsize %full type
438472 1 128K 2K 8K 512 2K 100.00 ZFS plain file
252 bonus UNKNOWN
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
UNKNOWN OBJECT TYPE
Include any warning/errors/backtraces from the system logs
sudo strace -f -e trace=ioctl -o /tmp/recv-strace.log zfs receive -v data/snapshots.daily/tarragon < /tmp/zfs-data2
grep -i einval /tmp/recv-strace.log | tail
1738587 ioctl(4, ZFS_IOC_RECV, 0x7ffcb98096f0) = -1 EINVAL (Invalid argument)
za3k@kill:~$ tail -20 /tmp/recv-strace.log
1738587 ioctl(0, TCGETS, 0x7ffcb9817c70) = -1 ENOTTY (Inappropriate ioctl for device)
1738587 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x7ffcb980a720) = 0
1738587 ioctl(3, ZFS_IOC_POOL_STATS, 0x7ffcb9807080) = -1 ENOMEM (Cannot allocate memory)
1738587 ioctl(3, ZFS_IOC_POOL_STATS, 0x7ffcb9807080) = 0
1738587 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x7ffcb980a1e0) = 0
1738587 ioctl(4, ZFS_IOC_OBJSET_STATS, 0x7ffcb9805ed0) = 0
1738587 ioctl(4, ZFS_IOC_RECV, 0x7ffcb98096f0) = -1 EINVAL (Invalid argument)
1738587 +++ exited with 1 +++
`receive_object` returns 22 (`+EINVAL`)
```!DMU_OT_IS_VALID(drro->drr_bonustype)``` this is probably the receive line that triggers failure.
There are two problems here -- a bad dataset on the source, and bad behavior once the source is bad. Unfortunately I don't know how to reproduce the original bad dataset, so I'm only filing a bug about what happens after it's in an invalid state.
System information
Describe the problem you're observing
I'm running the following:
Note that the dry run version of the receive succeeds, while the non-dry version fails.
I think it should be an error for
zfs sendto silently produce a stream, whichzfs receiverejects without detailed indication of why. Most likely,zfs sendshould produce a failure here. (and the dry receive version should fail as well)zfs scrubreports no problems -- it should probably detect this problem, although I'm not clear if COW semantics prevent it being fixed?I believe both of the above (receive behavior and scrub behavior) are bugs.
Most urgently, there is no recover path for users in this situation. I expect to get out of the situation myself with some bit-twiddling on disk, but ideally
zfs scrubwould report this issue, and there would be a way tozfs sendthis.Describe how to reproduce the problem
The root cause of what's going on is an object with
bonustype=60. I don't know how it got that way, or how to reproduce this state. Sorry! I notice the bonustype is only 1 bit off? Shouldn't it be checksummed, though?Include any warning/errors/backtraces from the system logs