Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype a new Buffer trait. #1290

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Prototype a new Buffer trait. #1290

wants to merge 3 commits into from

Conversation

sunfishcode
Copy link
Member

I'm experimenting with a Buffer trait similar to #908, however I've run into a few problems. See the questions in examples/new_read.rs for details.

@notgull @SUPERCILEX

Copy link
Contributor

@SUPERCILEX SUPERCILEX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with this approach, I feel like there's a pretty good chance compiler folks will improve the error messages for us if we file bug reports. The story around passing in a mutable array but needing a slice instead has annoyed me for quite some time (I usually just as_mut_slice everything). It was pretty confusing to figure out initially, so I'm sure better error messages would be welcome.

struct Wrapper<'a>(&'a mut [u8]);
impl<'a> Wrapper<'a> {
fn read(&mut self) {
let _x: usize = read(stdin(), self.0).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is due to rust-lang/rust#35919. You need to reborrow manually: &mut *self.0.

One option is to make the Buffer trait impl owned types so the read method borrows it. Unfortunately I believe this requires rust 1.65. Also it seems kinda ugly so maybe not worth it.

pub fn read<Fd: AsFd, Buf: Buffer<u8> + ?Sized>(
    fd: Fd,
    buf: &mut Buf,
) -> io::Result<Buf::Result<'_>> {
    let len = backend::io::syscalls::read(fd.as_fd(), buf.as_maybe_uninitialized())?;
    // SAFETY: `read` works.
    unsafe { Ok(buf.finish(len)) }
}

pub trait Buffer<T> {
    /// The result of the process operation.
    type Result<'a>
    where
        T: 'a,
        Self: 'a;

    /// Convert this buffer into a maybe-unitiailized view.
    fn as_maybe_uninitialized(&mut self) -> &mut [MaybeUninit<T>];

    /// Convert a finished buffer pointer into its result.
    ///
    /// # Safety
    ///
    /// At least `len` bytes of the buffer must now be initialized.
    unsafe fn finish(&mut self, len: usize) -> Self::Result<'_>;
}

/// Implements [`Buffer`] around the a slice of bytes.
///
/// `Result` is a `usize` indicating how many bytes were written.
impl<T> Buffer<T> for [T] {
    type Result<'a>
        = usize
    where
        T: 'a;

    #[inline]
    fn as_maybe_uninitialized(&mut self) -> &mut [MaybeUninit<T>] {
        // SAFETY: This just casts away the knowledge that the elements are
        // initialized.
        unsafe { core::mem::transmute::<&mut [T], &mut [MaybeUninit<T>]>(self) }
    }

    #[inline]
    unsafe fn finish(&mut self, len: usize) -> usize {
        len
    }
}

/// Implements [`Buffer`] around the a slice of uninitialized bytes.
///
/// `Result` is a pair of slices giving the initialized and uninitialized
/// subslices after the new data is written.
impl<T> Buffer<T> for [MaybeUninit<T>] {
    type Result<'a>
        = (&'a mut [T], &'a mut [MaybeUninit<T>])
    where
        T: 'a;

    #[inline]
    fn as_maybe_uninitialized(&mut self) -> &mut [MaybeUninit<T>] {
        self
    }

    #[inline]
    unsafe fn finish(&mut self, len: usize) -> Self::Result<'_> {
        let (init, uninit) = self.split_at_mut(len);

        // SAFETY: The user asserts that the slice is now initialized.
        let init = slice::from_raw_parts_mut(init.as_mut_ptr().cast::<T>(), init.len());

        (init, uninit)
    }
}

/// Implements [`Buffer`] around the `Vec` type.
///
/// This implementation fills the buffer, overwriting any previous data, with
/// the new data data and sets the length.
#[cfg(feature = "alloc")]
impl<T> Buffer<T> for Vec<T> {
    type Result<'a>
        = usize
    where
        T: 'a;

    #[inline]
    fn as_maybe_uninitialized(&mut self) -> &mut [MaybeUninit<T>] {
        self.clear();
        self.spare_capacity_mut()
    }

    #[inline]
    unsafe fn finish(&mut self, len: usize) -> usize {
        self.set_len(len);
        len
    }
}


// Why does this get two error messages?
let mut buf = [0, 0, 0];
let _x = read(stdin(), buf).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably because they're both valid. There's a hint assuming the type is an array and then the message that lists out all the valid types you can use.

src/buffer.rs Outdated
use core::mem::MaybeUninit;
use core::slice;

/// A memory buffer that may be uninitialized.
pub trait Buffer<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much it's worth discussing the implementation yet, but there are a few concerns:

  • I don't think this trait should be public (yet)
  • as_maybe_uninitialized is unsound in general, but not for our purposes. The conversion from u8 -> MaybeUninit<u8> is only valid if you never uninitialize the u8. We don't do that since we're always passing this data to a syscall, but for a public trait (and probably for ourselves just in case), the conversion to maybeuninit needs to be unsafe.
  • I don't think we should have an implementation for arrays ([T; N]). Maybe there's value I'm missing, but it seems simple enough to require you to pass in a slice instead.
  • Same argument for Vec<T>, you should be required to pass in &mut Vec<T> and it should use spare_capacity_mut (but not clear) plus set_len. Though maybe there's an argument for having the owned and borrowed versions with the owned one doing the clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much it's worth discussing the implementation yet, but there are a few concerns:

* I don't think this trait should be public (yet)

I've now made it `Sealed.

* `as_maybe_uninitialized` is unsound in general, but not for our purposes. The conversion from `u8 -> MaybeUninit<u8>` is only valid if you never uninitialize the `u8`. We don't do that since we're always passing this data to a syscall, but for a public trait (and probably for ourselves just in case), the conversion to maybeuninit needs to be unsafe.

Good point. I switched it back to a pointer+length now.

* I don't think we should have an implementation for arrays (`[T; N]`). Maybe there's value I'm missing, but it seems simple enough to require you to pass in a slice instead.

Agreed; I just added that as an example to see what the error looked like.

* Same argument for `Vec<T>`, you should be required to pass in `&mut Vec<T>` and it should use `spare_capacity_mut` (but not `clear`) plus `set_len`. Though maybe there's an argument for having the owned and borrowed versions with the owned one doing the `clear`.

Yeah this is subtle because Vec<T> has a DerefMut to &mut [T] so it'd mean that adding a * would switch whether we clear+set_len or not. So it seems to make sense to have the owned version to do the clear+set_len and then have &mut Vec<T> behave like &mut [T].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And even with &mut Vec<T> behaving like &mut [T] at use cases for this, I'm still not super comfortable with how subtle this is. Passing a Vec does one thing, while a &mut Vec does another.

Another option would be to omit the Vec<T> impl for this. But if we do that, then we're no longer encapsulating the set_len part of updating a Vec, and without that, I don't know if this whole Buffer trait is worth it.

Copy link
Contributor

@SUPERCILEX SUPERCILEX Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to read your comment in full, but yeah my proposal was to remove the impl for Vec and skip the clear in the &mut Vec.

Actually let's make it even simpler. Get rid of Vec. Giving us an &mut Vec results in the full clear + set_len sequence. If you didn't want the clear, then all you have to do is use spare_capacity_mut. If we really think people don't want the clear, we could literally just add a wrapper type called VecNoClear or something and implement it with only the set_len. That's way easier to use then know the difference between taking a reference and not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading your comment properly now.

I switched it back to a pointer+length now.

Yup, seems like the right move.

So it seems to make sense to have the owned version to do the clear+set_len and then have &mut Vec behave like &mut [T].

I see, yeah it doesn't seem pretty bad if &mut Vec<T> behaves subtly different from &mut *Vec<T>. What if we required all Vecs to go through nutypes? How about having a struct ClearableBuffer<T>(pub T) and struct PreparedBuffer<T>(pub T) (naming tbd). Then we implement ClearableBuffer<&mut Vec<T>> and the same for PreparedBuffer. Both call set_len but only ClearableBuffer calls clear at the beginning.

match read_uninit(self.fd.as_fd(), self.buf).map(|(init, _)| init.len()) {
todo!("FIXME: see \"Why doesn't this work?\" in examples/new_read.rs");
/*
match read(self.fd.as_fd(), self.buf).map(|(init, _)| init.len()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&mut *self.buf or the GAT version of the Buffer trait fixes this, as per my other comment.

@sunfishcode sunfishcode mentioned this pull request Jan 27, 2025
21 tasks
@sunfishcode sunfishcode added the semver bump Issues that will require a semver-incompatible fix label Jan 30, 2025
///
/// `Result` is a `usize` indicating how many bytes were written.
impl<T, const N: usize> Buffer<T> for &mut [T; N] {
impl<T, const N: usize> private::Sealed<T> for &mut [T; N] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure we can still implement the buffer trait here so we don't have to specify the impls twice.

src/buffer.rs Outdated
use core::mem::MaybeUninit;
use core::slice;

/// A memory buffer that may be uninitialized.
pub trait Buffer<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading your comment properly now.

I switched it back to a pointer+length now.

Yup, seems like the right move.

So it seems to make sense to have the owned version to do the clear+set_len and then have &mut Vec behave like &mut [T].

I see, yeah it doesn't seem pretty bad if &mut Vec<T> behaves subtly different from &mut *Vec<T>. What if we required all Vecs to go through nutypes? How about having a struct ClearableBuffer<T>(pub T) and struct PreparedBuffer<T>(pub T) (naming tbd). Then we implement ClearableBuffer<&mut Vec<T>> and the same for PreparedBuffer. Both call set_len but only ClearableBuffer calls clear at the beginning.

I'm experimenting with a `Buffer` trait similar to #908, however I've
run into a few problems. See the questions in examples/new_read.rs for
details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver bump Issues that will require a semver-incompatible fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants