Skip to content

feat: Read and seek #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 22, 2025
Merged

feat: Read and seek #119

merged 7 commits into from
Jul 22, 2025

Conversation

rklaehn
Copy link
Collaborator

@rklaehn rklaehn commented Jul 22, 2025

Description

Add an api for reading and seeking a blob using tokio::io::AsyncRead and tokio::io::AsyncSeek

Breaking Changes

Changes api::blobs::ExportRangesProgress::stream to be non-async - it does not have to be!

Notes & open questions

Note: the entire machinery of seek and read is horribly undocumented regarding the non happy case. So I went with the following approach: if you ever drop a future of either seek or read before completing, the reader becomes poisioned and will never work again, you have to grab a new one, which is cheap.

This bypasses the whole question about what to do if somebody calls x without having completed y, and in normal usage you should never notice this.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.

Copy link

github-actions bot commented Jul 22, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh-blobs/pr/119/docs/iroh_blobs/

Last updated: 2025-07-22T14:58:02Z

@@ -647,6 +658,12 @@ impl<'a> AddProgress<'a> {
}
}

/// Options for an async reader for blobs that supports AsyncRead and AsyncSeek.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ReaderOptions {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might think that this options object is a bit much, but who knows, maybe we want to add more options in the future.

One option would be the "magic wait until download" option, e.g. you seek to somewhere where the blob isn't downloaded yet, it waits instead of failing.

Or even, you seek to somewhere where the blob isn't downloaded yet, it triggers a download and waits instead of failing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mark it as non_exhaustive in this case?

@rklaehn rklaehn requested a review from dignifiedquire July 22, 2025 13:00
@dignifiedquire
Copy link
Collaborator

still in draft mode?

@n0bot n0bot bot added this to iroh Jul 22, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Jul 22, 2025
@rklaehn rklaehn marked this pull request as ready for review July 22, 2025 13:41
@rklaehn
Copy link
Collaborator Author

rklaehn commented Jul 22, 2025

Cross test fails with "memory allocation of 8388608 bytes failed". I think this is unrelated to the changes.

let mut position1 = None;
loop {
let guard = &mut this.state;
match std::mem::take(guard) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I had this pattern show up being extremly slow in another project :'( with the rust compiler allocating over and over again, making reads/writes

you can see how I fixed/worked around it here: https://github.com/rpgp/rpgp/pull/579/files#diff-70038cda95dcb4bc54877b1a3eb9f111ed226310dad30909b6d3cb88af6cd2f5

not saying this is currently slow, just warning you this can be very slow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is unfortunate. I like this pattern.

I think it should be fine as of now. The state is just 32 bytes, so it is very unlikely to matter compared with all the async stuff going on. I could rewrite it to just match on the state in place, but then there is always the chance that you get in some weird state instead of poisoned if you forget to handle an odd error.

};
Ok(())
}
ReaderState::Reading { .. } => Err(io::Error::other("Can't seek while reading")),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit unfortunate, not necessary for the PR but I have had instances in the past where I wanted to say read the first 100 bytes, seek 200 bytes, and then continue reading

Copy link
Collaborator Author

@rklaehn rklaehn Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that, as long as you don't drop futures.

read(...).await?
seek(...).await?
read(...).await?

is fine. the only thing that would break it would be something like this:

read(...).timeout(Duration::from_millis(1)).await?; // read future is dropped.
seek(...).await;

I mean, I would love to support this, but the docs is uniquely unhelpful about what the behaviour should be.

"The position returned by calling this method can only be relied on right after start_seek. If you have changed the position by e.g. reading or writing since calling start_seek, then it is unspecified whether the returned position takes that position change into account. Similarly, if start_seek has never been called, then it is unspecified whether poll_complete returns the actual position or some other placeholder value (such as 0)."

I guess you could make it so a seek from an absolute position (start and later end) recovers the state you get into when a read future is dropped, since at this point the internal state is deterministic again... 🤷

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, I missunderstood

"The position returned by calling this method can only be relied on right after start_seek. If you have changed the position by e.g. reading or writing since calling start_seek, then it is unspecified whether the returned position takes that position change into account. Similarly, if start_seek has never been called, then it is unspecified whether poll_complete returns the actual position or some other placeholder value (such as 0)."

uff...

Maybe it helps with the memory allocation failures
@rklaehn rklaehn merged commit 8eb5733 into main Jul 22, 2025
24 checks passed
@rklaehn rklaehn deleted the read-and-seek branch July 22, 2025 15:06
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

2 participants