ext4: support bigger blkdev block size #241

mcimerman · 2025-01-25T20:52:17Z

Allows ext4 to be used with block devices that have block size > EXT4_SUPERBLOCK_SIZE (1K).

Currently, ext4 "supports" only block dev bsize smaller or equal to the superblock size. (quotes on support because it gets EPERM when ~~writing~~ reading past the allocated buffer)

This currently fails (blkdev bsize is set to 4K with -b to file_bd):

# mkfile -s 100M /tmp/file1

# /srv/bd/file_bd -b 4096 /tmp/file1 disk1

# mkext4 disk1

But the block devices must be at most the block size of the filesystem, currently maximum is 4096 which is the default too.

Will be useful when for example NVMe support is implemented in the future, or something that uses bigger block sizes.

le-jzr · 2025-01-25T22:11:58Z

Nice. Out of curiosity, do you think there's any value in device blocks even being exposed in the interface like that?
IMO it would be simpler and more efficient to present block devices like normal random access files and let the driver take care of the gritty details.

mcimerman · 2025-01-25T22:53:37Z

do you think there's any value in device blocks even being exposed in the interface like that?

Do you mean /srv/bd/file_bd -b 4096 /tmp/file1 disk1? I think the file backed block device is great for debugging.

Edit: Sorry, you probably mean all of the block devices. Got to think about that :-D

le-jzr · 2025-01-25T23:13:03Z

I mean the API interface between a file system and a block device. Like, why should ext4 driver even care what the physical block size is?

Sorry I was being ambiguous. :)

mcimerman · 2025-01-25T23:14:49Z

Yes, I get you.

That would indeed make a lot of things a lot simpler. And it doesn't actually sound that hard to realize.

Just rewrite libblock (and convert everything) to use the classic file API. Or is there some pitfall I don't see?

Edit: Well, probably the block caching will have to be hidden inside the file API, which might not be very easy to do, and it's quite nice that filesystems can manipulate the blocks on a lower level, set some flags or mark them dirty, etc.

le-jzr · 2025-01-25T23:25:27Z

Or is there some pitfall I don't see?

I can't think of anything, but you seem to be more intimately familiar with the code right now, so thought I'd ask you. 😄

Anyway, right now I'm trying to finish implementing a completely new way to do IPC, and if that works out like I want, then we'll have the opportunity to rework some IPC protocols as they are being migrated to the new shiny thing. Good reason to think about how the seams between layers should work and where there are opportunities for improvement.

le-jzr · 2025-01-25T23:33:44Z

Edit: Well, probably the block caching will have to be hidden inside the file API, which might not be very easy to do, and it's quite nice that filesystems can manipulate the blocks on a lower level, set some flags or mark them dirty, etc.

The comparison to files was just illustrative. Obviously it would still be its own protocol with its own nuances, but ability to read/write with arbitrary granularity (and size) like files do would be nice.

Though you make a good point. I'm not actually familiar with the extent of operations that can be done on blocks. But now that I think about it, "trim" would be a good example? Since that can only work on whole blocks, and definitely sounds like something we want to support (eventually).

EDIT: So maybe the right interface would have flexible reads/writes, but the intrinsically blocky operation would still have block-based interface that the file system can exploit if it wants to.

mcimerman · 2025-01-25T23:36:05Z

seem to be more intimately familiar with the code right now, so thought I'd ask you

I've been working on HelenRAID for a few months now, but in there I use the direct read/write of blocks, just forwarding the final writes or reads that come to the array. Just now I started to look at filesystems, because I would like to do some performance evaluation on "real" workload, with files and whatnot. The filesystems use libblock more extensively, with caching and everything, which I don't fully understand as well as the filesystems.

new way to do IPC

Sounds great, also the "non-redundant" copying would be great to have, at least for block devices and big IO. I am interested in the new IPC you are working on, do you have any notes or something I can read to try to understand your new proposal? I am not very experienced in that area, but there is a chance I can help :-D

le-jzr · 2025-01-26T00:02:20Z

No formal proposal as of yet, since I'm still figuring out how to implement what I want and details keep changing to accomodate. :)

But currently, the broad strokes look like this: Task creates an IPC queue, then it creates a uniquely identified IPC endpoint on that queue (possibly one for each unique resource handled by the server, e.g. each open file). The endpoint makes its way to a client via IPC (similar to what IPC_CONNECT__ calls do now, but with more control and granularity). The client makes calls on the endpoint, providing its own endpoint as a return address.

The main feature is that endpoints represent individual resources managed by servers and can be transferred between tasks arbitrarily. They basically act like references to OOP objects that live in another task.

Anyway, the messages sent to endpoint are fixed size, so to send a random bucket of data, you explicitly write the data to a kernel buffer (immutable once created), then send a reference to that buffer to the other party, who can then read chunks out of it. The IPC forward call function is replaced simply by being able to pass the received buffer reference to another task without touching it. Possibly more than once for different parts.

uspace/lib/ext4/src/superblock.c

Allows ext4 to be used with block devices that have block size > EXT4_SUPERBLOCK_SIZE (1K).

mcimerman · 2025-01-27T19:06:32Z

Obviously it would still be its own protocol with its own nuances, but ability to read/write with arbitrary granularity (and size) like files do would be nice.

Check block_read_bytes_direct()1. Writes could be done the same way, but then unaligned writes would have to be dealt with like the bug above, the block(s) would have to be read first.

uspace/lib/ext4/src/superblock.c

le-jzr reviewed Jan 27, 2025

View reviewed changes

uspace/lib/ext4/src/superblock.c Outdated Show resolved Hide resolved

ext4: support bigger blkdev block size

cb747b3

Allows ext4 to be used with block devices that have block size > EXT4_SUPERBLOCK_SIZE (1K).

mcimerman force-pushed the ext4-big-blkdev-bsize branch from afe8ff5 to cb747b3 Compare January 27, 2025 15:11

le-jzr reviewed Jan 28, 2025

View reviewed changes

uspace/lib/ext4/src/superblock.c Show resolved Hide resolved

le-jzr added 2 commits January 29, 2025 14:51

A few adjustments

ea50e67

One more tweak

4a3a5a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ext4: support bigger blkdev block size #241

ext4: support bigger blkdev block size #241

mcimerman commented Jan 25, 2025 •

edited

Loading

le-jzr commented Jan 25, 2025

mcimerman commented Jan 25, 2025 •

edited

Loading

le-jzr commented Jan 25, 2025 •

edited

Loading

mcimerman commented Jan 25, 2025 •

edited

Loading

le-jzr commented Jan 25, 2025

le-jzr commented Jan 25, 2025 •

edited

Loading

mcimerman commented Jan 25, 2025

le-jzr commented Jan 26, 2025 •

edited

Loading

mcimerman commented Jan 27, 2025

ext4: support bigger blkdev block size #241

Are you sure you want to change the base?

ext4: support bigger blkdev block size #241

Conversation

mcimerman commented Jan 25, 2025 • edited Loading

le-jzr commented Jan 25, 2025

mcimerman commented Jan 25, 2025 • edited Loading

le-jzr commented Jan 25, 2025 • edited Loading

mcimerman commented Jan 25, 2025 • edited Loading

le-jzr commented Jan 25, 2025

le-jzr commented Jan 25, 2025 • edited Loading

mcimerman commented Jan 25, 2025

le-jzr commented Jan 26, 2025 • edited Loading

mcimerman commented Jan 27, 2025

mcimerman commented Jan 25, 2025 •

edited

Loading

mcimerman commented Jan 25, 2025 •

edited

Loading

le-jzr commented Jan 25, 2025 •

edited

Loading

mcimerman commented Jan 25, 2025 •

edited

Loading

le-jzr commented Jan 25, 2025 •

edited

Loading

le-jzr commented Jan 26, 2025 •

edited

Loading