-
Notifications
You must be signed in to change notification settings - Fork 14
feat: Read and seek #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Read and seek #119
Conversation
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh-blobs/pr/119/docs/iroh_blobs/ Last updated: 2025-07-22T14:58:02Z |
@@ -647,6 +658,12 @@ impl<'a> AddProgress<'a> { | |||
} | |||
} | |||
|
|||
/// Options for an async reader for blobs that supports AsyncRead and AsyncSeek. | |||
#[derive(Debug, Clone, Serialize, Deserialize)] | |||
pub struct ReaderOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might think that this options object is a bit much, but who knows, maybe we want to add more options in the future.
One option would be the "magic wait until download" option, e.g. you seek to somewhere where the blob isn't downloaded yet, it waits instead of failing.
Or even, you seek to somewhere where the blob isn't downloaded yet, it triggers a download and waits instead of failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe mark it as non_exhaustive in this case?
still in draft mode? |
Cross test fails with "memory allocation of 8388608 bytes failed". I think this is unrelated to the changes. |
let mut position1 = None; | ||
loop { | ||
let guard = &mut this.state; | ||
match std::mem::take(guard) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, I had this pattern show up being extremly slow in another project :'( with the rust compiler allocating over and over again, making reads/writes
you can see how I fixed/worked around it here: https://github.com/rpgp/rpgp/pull/579/files#diff-70038cda95dcb4bc54877b1a3eb9f111ed226310dad30909b6d3cb88af6cd2f5
not saying this is currently slow, just warning you this can be very slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is unfortunate. I like this pattern.
I think it should be fine as of now. The state is just 32 bytes, so it is very unlikely to matter compared with all the async stuff going on. I could rewrite it to just match on the state in place, but then there is always the chance that you get in some weird state instead of poisoned if you forget to handle an odd error.
}; | ||
Ok(()) | ||
} | ||
ReaderState::Reading { .. } => Err(io::Error::other("Can't seek while reading")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit unfortunate, not necessary for the PR but I have had instances in the past where I wanted to say read the first 100 bytes, seek 200 bytes, and then continue reading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do that, as long as you don't drop futures.
read(...).await?
seek(...).await?
read(...).await?
is fine. the only thing that would break it would be something like this:
read(...).timeout(Duration::from_millis(1)).await?; // read future is dropped.
seek(...).await;
I mean, I would love to support this, but the docs is uniquely unhelpful about what the behaviour should be.
"The position returned by calling this method can only be relied on right after start_seek. If you have changed the position by e.g. reading or writing since calling start_seek, then it is unspecified whether the returned position takes that position change into account. Similarly, if start_seek has never been called, then it is unspecified whether poll_complete returns the actual position or some other placeholder value (such as 0)."
I guess you could make it so a seek from an absolute position (start and later end) recovers the state you get into when a read future is dropped, since at this point the internal state is deterministic again... 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, I missunderstood
"The position returned by calling this method can only be relied on right after start_seek. If you have changed the position by e.g. reading or writing since calling start_seek, then it is unspecified whether the returned position takes that position change into account. Similarly, if start_seek has never been called, then it is unspecified whether poll_complete returns the actual position or some other placeholder value (such as 0)."
uff...
Maybe it helps with the memory allocation failures
Description
Add an api for reading and seeking a blob using tokio::io::AsyncRead and tokio::io::AsyncSeek
Breaking Changes
Changes api::blobs::ExportRangesProgress::stream to be non-async - it does not have to be!
Notes & open questions
Note: the entire machinery of seek and read is horribly undocumented regarding the non happy case. So I went with the following approach: if you ever drop a future of either seek or read before completing, the reader becomes poisioned and will never work again, you have to grab a new one, which is cheap.
This bypasses the whole question about what to do if somebody calls x without having completed y, and in normal usage you should never notice this.
Change checklist