Skip to content

invalid bonustype not reported by scrub, cannot be received #18685

@za3k

Description

@za3k

There are two problems here -- a bad dataset on the source, and bad behavior once the source is bad. Unfortunately I don't know how to reproduce the original bad dataset, so I'm only filing a bug about what happens after it's in an invalid state.

System information

Type Version/Name
Distribution Name Debian Linux
Distribution Version 13
Kernel Version 6.12.63
Architecture amd6
OpenZFS Version zfs-2.3.2-2 zfs-kmod-2.3.2-2

Describe the problem you're observing

I'm running the following:

germinate:~$ sudo zfs send -i data/snapshots.daily/tarragon@20250919 data/snapshots.daily/tarragon@20250921 >/tmp/zfs-data
kill:~$ sudo zfs receive -v data/snapshots.daily/tarragon < /tmp/zfs-data
receiving incremental stream of data/snapshots.daily/tarragon@20250921 into data/snapshots.daily/tarragon@20250921
cannot receive incremental stream: invalid backup stream

Note that the dry run version of the receive succeeds, while the non-dry version fails.
I think it should be an error for zfs send to silently produce a stream, which zfs receive rejects without detailed indication of why. Most likely, zfs send should produce a failure here. (and the dry receive version should fail as well)

zfs scrub reports no problems -- it should probably detect this problem, although I'm not clear if COW semantics prevent it being fixed?

I believe both of the above (receive behavior and scrub behavior) are bugs.

Most urgently, there is no recover path for users in this situation. I expect to get out of the situation myself with some bit-twiddling on disk, but ideally zfs scrub would report this issue, and there would be a way to zfs send this.

Describe how to reproduce the problem

The root cause of what's going on is an object with bonustype=60. I don't know how it got that way, or how to reproduce this state. Sorry! I notice the bonustype is only 1 bit off? Shouldn't it be checksummed, though?

zachary@germinate:~$ sudo zdb -dddd data/snapshots.daily/tarragon@20250919 438472
Dataset data/snapshots.daily/tarragon@20250919 [ZPL], ID 630216, cr_txg 43429282, 436G, 4503376 objects, rootbp DVA[0]=<0:3ad2e2193000:3000> DVA[1]=<0:3394b2b52000:3000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=43429238L/43429238P fill=4503376 cksum=0000001492ed98fc:000039bff6a342a6:00564ee0bd0ed707:5b16a8c385fa872a

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
    438472    1   128K     2K     8K     512     2K  100.00  ZFS plain file
                                               252   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
        dnode maxblkid: 0
        uid     2001
        gid     2001
        atime   Sat Aug 28 23:09:31 2021
        mtime   Fri Jan 10 23:52:47 2020
        ctime   Sat Aug 28 23:09:31 2021
        crtime  Sat Aug 28 23:09:31 2021
        gen     20054484
        mode    100644
        size    1592
        parent  424857
        links   1
        pflags  40800000004
        SA xattrs: 76 bytes, 1 entries

                user.rsync.%stat = 100444 0,0 1000:100

zachary@germinate:~$ sudo zdb -dddd data/snapshots.daily/tarragon@20250921 438472
Dataset data/snapshots.daily/tarragon@20250921 [ZPL], ID 382289, cr_txg 43467660, 436G, 4503427 objects, rootbp DVA[0]=<0:3af5b4c6f000:3000> DVA[1]=<0:338b69586000:3000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=43467660L/43467660P fill=4503427 cksum=00000014f695affc:00003ad94b6bad6e:0057faa1834c31af:5ce652591a8ef101

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
    438472    1   128K     2K     8K     512     2K  100.00  ZFS plain file
                                               252   bonus  UNKNOWN
        dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
        dnode maxblkid: 0
        UNKNOWN OBJECT TYPE

Include any warning/errors/backtraces from the system logs

sudo strace -f -e trace=ioctl -o /tmp/recv-strace.log zfs receive -v data/snapshots.daily/tarragon < /tmp/zfs-data2
grep -i einval /tmp/recv-strace.log | tail
1738587 ioctl(4, ZFS_IOC_RECV, 0x7ffcb98096f0) = -1 EINVAL (Invalid argument)
za3k@kill:~$ tail -20 /tmp/recv-strace.log
1738587 ioctl(0, TCGETS, 0x7ffcb9817c70) = -1 ENOTTY (Inappropriate ioctl for device)
1738587 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x7ffcb980a720) = 0
1738587 ioctl(3, ZFS_IOC_POOL_STATS, 0x7ffcb9807080) = -1 ENOMEM (Cannot allocate memory)
1738587 ioctl(3, ZFS_IOC_POOL_STATS, 0x7ffcb9807080) = 0
1738587 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x7ffcb980a1e0) = 0
1738587 ioctl(4, ZFS_IOC_OBJSET_STATS, 0x7ffcb9805ed0) = 0
1738587 ioctl(4, ZFS_IOC_RECV, 0x7ffcb98096f0) = -1 EINVAL (Invalid argument)
1738587 +++ exited with 1 +++

`receive_object` returns 22 (`+EINVAL`)

```!DMU_OT_IS_VALID(drro->drr_bonustype)``` this is probably the receive line that triggers failure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions