-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Wire O_DIRECT also to Uncached I/O #17218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The change is quite invasive on DMU/DBUF APIs. In some places it was not obvious what is better or where to stop, so I am open to comments on whether we'd like to change more, while already there, or less, to keep some parts of compatibility if that is even possible at all with so big change. |
Would it makes sense to activate this when the user does a
|
@tonyhutter I am not familiar with that fcntl, but sounds possible as a next step. I was thinking about |
Ah yes, |
7579173
to
ef0c865
Compare
|
So, I am not particularly in favor of a strict being disabled by default. I think we already murky the waters enough when a write I/O is My concern is people are expecting one thing and get another transparently. This leads to confusion and makes ZFS seem like it is not doing what they strictly asked it to do (do not make any copies of my data). I also don't like the idea that prefetching can happen if a user explicitly requested I think if we decided to add another dataset property setting for I think we don't need to complicate things more in my opinion. This should be an opt in feature if anything with maybe a new dataset value for |
My patch actually improves this case, evicting the data from DBUF and ARC caches as soon as possible, reducing cache effects as user asked.
I don't think it has to be strict other than for software testing. The man page on Linux says: "The handling of misaligned O_DIRECT I/Os also varies; they can either fail with EINVAL or fall back to buffered I/O.", so relaxed behavior is not a violation.
We've already found several examples of software having no idea about alignment, but using O_DIRECT, including so general as systemd extensions. Considering they are "broken", I bet more typical Linux file systems to do not enforce alignment. And for us being strict just increase suffering. I am not interested to fix all software in the world, but happy to provide them a testing tool in a shape of a tunable.
The data prefetch will only activate on misaligned I/Os. So obviously user already does not know what he is doing. It is trivial to actually disable it now, if you insist. But I considered that somebody might use O_DIRECT with Direct I/O disabled via module parameter, just to reduce cache trashing on some parts of workload, and prefetch really give performance improvement in many cases even with NVMe pools if application queue depth is insufficient.
I was actually thinking about additional
It is likely the only way to use Direct I/O with SMB and NFS also, since they have no concepts of alignment in protocol. So it seems the world is somewhat less "perfect" than we'd like. ;) |
Quick question:
Do you intend to keep this wired to (My reading of the code changes suggests not, but my reading of the discussion suggests yes, so I'm left uncertain!) |
Yes. Exactly the same effect is expected from either of |
Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS. While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications. To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Fixes openzfs#17027
Just fixed the commend typo and rebased. |
I was thinking about this today and then realised I’d misunderstood exactly what
So it seems there’s two variables here:
If they were both options,
So if I’m understanding this PR, the new With my sysadmin hat on, I think I prefer
It feels like there's two things going on here. One is that the user didn't ask for The other is about "the only requirement we request". Until we added support for It seems unsurprising that there are So it seems like "work, but slower" is a kinder thing to do. Not to mention that that's historically what OpenZFS has always done with I don’t know if we actually want all the property values as described above (if we even could change the property values we currently have without causing problems, though I have an idea for that). But at least, I think switching standard over to “relaxed” is probably the right thing to do. If we do want more visibility, we could have a kstat counter noting the number of times a direct IO request redirected to the ARC for service due to misalignment. It’s not super obvious of course, but combined with a system call trace on the process it at least gives the operator the ability to see what’s going on. |
Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS.
While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications.
To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms.
Fixes #17027
How Has This Been Tested?
Basic read/write tests with and without O_DIRECT, observing proper cache behavior.
Types of changes
Checklist:
Signed-off-by
.