Skip to content

Add Body::poll_progress #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sfackler
Copy link

As described in hyperium/hyper#3121, this allows server implementations to abort body writes even if the client is not reading data.

@LucioFranco
Copy link
Member

@sfackler would it be possible to include a timeout example with this? Or maybe add that example in the doc comment I am curious to see how this would be used.

@sfackler
Copy link
Author

We could add a TimeoutBody wrapper to http-body-util, though that wold require making tokio an optional dependency. I can update the PR tonight.

@LucioFranco
Copy link
Member

Maybe even at least an example so tokio doesn't need to be a public dep.

@sfackler
Copy link
Author

sfackler commented Mar 14, 2023

Here's a TimeoutBody implementation (untested):

#[pin_project]
pub struct TimeoutBody<B> {
    #[pin]
    inner: B,
    #[pin]
    timer: Sleep,
    timeout: Duration,
    waiting: bool,
}

impl<B> Body for TimeoutBody<B>
where
    B: Body,
{
    type Data = B::Data;
    type Error = TimeoutErro<B::Error>;

    fn poll_frame(
        mut self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Frame<Self::Data>, Self::Error>>> {
        if let Poll::Ready(o) = self.as_mut().project().inner.poll_frame(cx) {
            *this.waiting = false;
            return Poll::Ready(o.map(|r| r.map_err(TimeoutError::Inner)));
        }

        self.is_healthy(cx)?;
        Poll::Pending
    }

    fn is_healthy(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Result<(), Self::Error> {
        let this = self.project();

        if !*this.waiting {
            this.timer.reset(Instant::now() + *this.timeout);
            *this.waiting = true;
        }

        if this.timer.poll(cx).is_ready() {
            return Err(TimeoutError::TimedOut);
        }

        Ok(())
    }

    fn is_end_stream(&self) -> bool {
        self.inner.is_end_stream()
    }

    fn size_hint(&self) -> SizeHint {
        self.inner.size_hint()
    }
}

pub enum TimeoutError<E> {
    Inner(E),
    TimedOut,
}

@sfackler
Copy link
Author

@seanmonstar thoughts on this?

/// `poll_frame` calls and report an error from `poll_healthy` when time expires.
///
/// The default implementation returns `Ok(())`.
fn poll_healthy(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Result<(), Self::Error> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted this to be more poll-like we could have a return value of Poll<Result<Void, Self::Error>>>.

@sfackler sfackler changed the title Add Body::poll_healthy Add Body::poll_progress Jan 23, 2024
@sfackler
Copy link
Author

I've updated this to move from poll_healthy to a more general poll_progress, like what boats described for the Stream trait. In addition to the error reporting case I opened this to handle, the same motivations apply here as well.

olix0r added a commit to olix0r/http-body that referenced this pull request Mar 8, 2024
olix0r added a commit to linkerd/linkerd2-proxy that referenced this pull request Apr 11, 2024
hyperium/http-body#90 proposes adding a `Body::poll_progress` method to the
`Body` trait. This PR uses a fork of hyper that uses this proposed API when
awaiting stream send capacity. This supports implementing timeouts on streams
and connections in an unhealthy state to defend servers against resource
exhaustion.
olix0r added a commit to linkerd/linkerd2-proxy that referenced this pull request Apr 12, 2024
hyperium/http-body#90 proposes adding a `Body::poll_progress` method to the
`Body` trait. This PR uses a fork of hyper that uses this proposed API when
awaiting stream send capacity. This supports implementing timeouts on streams
and connections in an unhealthy state to defend servers against resource
exhaustion.
@olix0r
Copy link

olix0r commented Apr 18, 2024

We (Linkerd) are interested in moving this proposal forward.

I've been testing this with patched versions of http-body 0.3 and Hyper 0.14 and the results are promising.

I'm using this to implement a middleware that enforces a progress timeout to cancel stuck streams.

What does the process look like for finalizing this proposal? Is there anything I can do to help?

@seanmonstar
Copy link
Member

Thanks for trying it out, @olix0r! I'm glad to have at least 2 use cases for something so fundamental. We can move this forward.

In my prep for doing so, I went back and read the previous conversations, and also the poll_progress post. I think the problem and solution that withoutboats describes is similar, but different enough that perhaps we shouldn't use the same name/mechanism. That's because, this feature isn't describing making progress on the body separate from producing a frame. It's rather to propagate cancelation while waiting on backpressure to clear up.

It feels closer to oneshot::Sender::poll_closed(). Should we change the name here to poll_closed() or poll_canceled()?

At the same time, I'm writing up a longer blog post about this feature, I'll share a draft with you soon.

@sfackler
Copy link
Author

sfackler commented May 6, 2024

That seems like a reasonable-enough name to me, though it might be a bit strange to have poll_closed return Result<(), Error>? That's why the original name in the PR was poll_healthy.

@seanmonstar
Copy link
Member

Hm, thanks for pointing that out. It made me think through the problem a bit more I at first figured we could just make it poll_closed() -> Poll<()>, like the oneshot sender. But that has some things to work out:

  • A default implementation of the method needs to work, and can't check any state.
    • Well, actually, could it check Self::is_end_stream()? Maybe that could work like Send::poll_closed() and is_closed().
    • But even the default implementation of is_end_stream() will always be false, so something waiting on poll_closed() then would wait forever.
  • Should it indicate a bad condition? Or just closure?

@sfackler
Copy link
Author

sfackler commented May 6, 2024

A default implementation that just returns Poll::Ready(Ok(())) is correct, and equivalent to the current state of the world.

I don't think it would indicate closure, just drive any background IO (like poll_progress does). For convenience, it allows errors to be returned directly, but an error out of poll_healthy or whatever we call it would be equivalent to that error being returned from poll_frame.

@sfackler
Copy link
Author

sfackler commented May 6, 2024

In that sense, it is really pretty similar to Boats's poll_progress, just fallible.

We could make it infallible and force the error to come out of poll_frame but that just seems like it'd make the implementation more annoying for no real benefit.

@seanmonstar
Copy link
Member

So you find that it was needed to make background progress, too? I didn't think that was a goal of the method. Just to detect closure while a frame isn't needed, such as by polling a Sleep.

@sfackler
Copy link
Author

sfackler commented May 6, 2024

Polling a sleep is background progress IMO! :)

My initial use case was purely around detecting disconnects/timeouts and things like that, but Boats's post on poll_progress made me feel like there's no reason you couldn't have some other body implementation that wanted to do some background IO. For example, if you're proxying you may want to internally pull data from the upstream connection while the downstream connection is idle.

@seanmonstar
Copy link
Member

That's a fair point. Perhaps the naming is fine then. Or at least, worth considering if poll_progress is better than the alternatives.

Now, I think one more remaining question is about return type. I do think a poll method should return Poll<T>. But what does each value mean? Specifically, what's the difference between Ready(Ok(())) and Pending? It does feel like their different. The default will just return Ok(()). And if it were checking a Sleep, I assume it'd return Pending. What should the caller (such as inside hyper) do with that information? Whatever we determine for that should end up documented on the method.

@sfackler
Copy link
Author

sfackler commented May 6, 2024

I think that Ready(Ok(()) would mean "I'm done making progress in the background". If we don't require poll_progress to be fused, the caller would need to remember that and not call it again which seems pretty annoying TBH. Pending means that there's more background progress work to be done later, the same as any other future.

In practice, the only thing that callers would actually look for is the Ready(Err) case (see hyperium/hyper#3169).

EDIT: Actually, I think we have to require that it's fused to be able to use it properly.

@sfackler
Copy link
Author

sfackler commented May 7, 2024

Added a few bits of docs on return values, and poll_progress implementations to http-body-util combinators.

@olix0r
Copy link

olix0r commented Jul 29, 2024

@seanmonstar Are you comfortable with this PR? Is there anything else to consider?

@seanmonstar
Copy link
Member

I believe generally yes. I got part way through a write-up to publish as a blog post, to explore the area more and get more feedback, since I think it's a sufficiently interesting change to some fundamental crates. I still plan to finish that up, I just had to pause as I dealt with some other contracting work.

@olix0r
Copy link

olix0r commented Dec 4, 2024

@seanmonstar 👋 Is there anything we can do to help move this PR forward?

@seanmonstar
Copy link
Member

Thanks for the reminder! I've added it to my active list, I think I can get back on the write-up. Could I send a draft for review, say, next week?

@olix0r
Copy link

olix0r commented Dec 13, 2024

@seanmonstar I'll be camping next week but I should have some time to review after Dec 19th. Thanks!

@sfackler
Copy link
Author

I'm also happy to take a look.

@olix0r
Copy link

olix0r commented Mar 31, 2025

@seanmonstar Hey there. Just nudging this along. We've finally crossed the bridge to hyper 1.0, so now I'm extra eager to get this through :)

@seanmonstar
Copy link
Member

Yep, adding to my plate for this week.

@seanmonstar
Copy link
Member

@sfackler @olix0r I've written up a draft blog post that I'd like to publish, since I feel like this change is fairly fundamental and could use as much refinement as necessary. If you'd like to review and suggest any edits, I'd appreciate it! If not, I can publish as-is: https://hackmd.io/@seanmonstar/HkHNiA5Akx

@sfackler
Copy link
Author

Looks good to me! The one thing that might be worth mentioning is that in addition to being useful to propagate cancellation, it can also be used to allow the logic to do some work while Hyper's waiting to write the previous block out. For example, you could be gzip-compressing the next block.

@seanmonstar
Copy link
Member

it can also be used to allow the logic to do some work while Hyper's waiting to write the previous block out. For example, you could be gzip-compressing the next block.

Hm, I'm hesitant to recommend that. As I think about it, I think that could be an anti-pattern. If the write-side is applying backpressure, then you probably don't want to do extra work producing a new frame, it might not ever be needed. Well, I suppose you could chose to do so opportunistically, with a preference for producing faster at the cost of possible extra wasted work...


(Also, for anyone else looking, that draft link supports inline comments and edit suggestions as long as you're logged in. If there's no other feedback, I could publish early next week.)

@sfackler
Copy link
Author

Yeah, you'd need to make sure you put a bound on the amount of background work. Definitely not necessary to talk about though!

@seanmonstar
Copy link
Member

Alright, I did another edit pass (mostly just addressing grammar, repeat words, and slight styling). Unless anyone wants to discuss further, I'll be posting either Tuesday or Wednesday.

Copy link
Member

@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am excited to see this proposal moving forward.

i have one small question, about whether Body implementors can ever explicitly depend on poll_progress() being called.

@seanmonstar
Copy link
Member

Published: https://seanmonstar.com/blog/body-poll-progress/

I'll allow for some time to receive any other feedback.

@Darksonn
Copy link

It sounds like we could just return Poll<E> with the default implementation being Poll::Pending. What is the advantage of a separate Poll::Ready(Ok(())) return value possibility?

@sfackler
Copy link
Author

It's done purely so you can ?-propagate errors in the implementation. Maybe Poll<Result<!, E>> would be better?

@seanmonstar
Copy link
Member

FWIW, that would also mean that if you did do body.progress().await, by default that would wait forever.

@Darksonn
Copy link

True, but you have to handle the waits forever case anyway in case a body isn't using the default implementation.

@coolreader18
Copy link

coolreader18 commented Apr 22, 2025

Hm, I'm hesitant to recommend that. As I think about it, I think that could be an anti-pattern. If the write-side is applying backpressure, then you probably don't want to do extra work producing a new frame, it might not ever be needed. Well, I suppose you could chose to do so opportunistically, with a preference for producing faster at the cost of possible extra wasted work...

I agree with your point about backpressure - if it's not a good idea to actually make any real progress in the function, poll_progress might not be the best name; perhaps poll_cancelled()? Though I suppose it's not a huge deal if some amount of work is done, and that can probably be decided case-by-case, depending on the Body type. But if that's the case, I do think it should be mentioned as a possibility, even if it's not being encouraged.

@wyfo
Copy link

wyfo commented Apr 23, 2025

Coming from Reddit, I had to read @seanmonstar blog post, and withoutboats' one to really understand what poll_progress means here, and I still don't find it very intuitive. Especially, if poll_progress returns an error, it doesn't tell me directly whether this is an error of the "progression" that would be recoverable, or an unrecoverable error telling me I should not poll anymore, or a normal termination signal.

On the other hand, I directly understand what poll_closed does without having to read any blog post. I mean I would rather have this kind of code:

    select! {
        dst.write(frame) => {
            // continue
        },
        reason = body.closed() => {
            tracing::debug!(?reason, "closed");
            dst.abort();
            return;
        }
    }

than the one presented in the blog post, as I find it more intuitive.
It's true however that the default implementation returning Poll::Pending might be confusing, as I would expect poll_closed to return whenever the stream is actually closed, but it might just be a temporary issue. Indeed, as soon as the new method is released, I would expect the ecosystem to adapt and external implementors to release a new implementation matching the desired semantic. All in all, the default implementation would be the kind of legacy you have to introduce sometimes, but resulting in a better overall semantic.

Also, regarding withoutboats proposal, the signature was fn poll_progress(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()>. Here, I find it intuitive, as I understand that poll_progress is a way to make thing progress in background, hence the return type Poll<()>, which means you will have to retrieve the progression result sooner or later with the dedicated method. So I would let poll_progress name to this concept.
The issue here, at least as described in seanmonstar blog post, is about "backpressured cancelation", so I would advocate to name the new method poll_cancelled/poll_closed.

@Darksonn
Copy link

It's true however that the default implementation returning Poll::Pending might be confusing, as I would expect poll_closed to return whenever the stream is actually closed.

Hmm ... but the stream is not closed until you get an empty frame from poll_frame, right? And once you do get that, it doesn't make sense to call poll_closed.

@wyfo
Copy link

wyfo commented Apr 23, 2025

I understood the counter argument of the blog post as:

  • you spawn a task waiting for body.closed()
  • your main task read the body, and ends reading the last frame of it, closing it de facto
  • the task waiting for body.closed() is still pending, while the body is closed --> counterintuitive

However, I'm not an expert about hyper, so I don't know if user may write this kind of side-task or not, but I would say they may not, so the counterintuitive aspect is quite negligible indeed.

@farnz
Copy link

farnz commented Apr 25, 2025

TL;DR I'd like to bikeshed the name in the direction of poll_frame_error (not a good name, either, so needs more bikeshed), and I have a couple of questions around guarantees that are similar to the fused stream guarantees.

I've rewritten this comment a few times, and I'm still not happy that it's fully expressing my views, but I'm submitting anyway because I don't think I'll get clearer.

The concern I have with calling this method poll_progress is that you're not asking the source to make progress; rather, you're asking it if poll_frame is definitely going to return Poll::Ready(Err(_)) on some future call. Answering that question can involve making progress, but does not have to; for example, you might take the error from a socket rather than attempting to read from the socket. And outside of the context of the Body trait, there exist streams (like FuturesOrdered, for example), for which the concept of "make progress if possible" makes sense, but the concept of "will you return an error the next time you're called" only makes sense if the contained future's Output type is a Result or Option; it'd be good to avoid taking the name that's "obviously" correct for the case of "make some progress if possible, don't pull in from the rest of the world".

I'd thus prefer to call it something like poll_frame_error or poll_failure instead of poll_progress, especially since you really don't want people to get the impression that they must buffer indefinitely to let poll_progress keep making progress,

Separately, is there a guarantee that if poll_progress returns an error, it will always return Poll::Ready(Err(_))? Or can it return Poll::Pending after returning an error? If it can return Poll::Pending after an error, is it guaranteed to be stuck there, or can it alternate between Poll::Pending and Poll::Ready(Err(_))? I'm assuming from the discussion above (including the suggestion that it should maybe return Result<!, E>) that it will never return Poll::Ready(Ok(())).

Similarly, is it guaranteed that if I called poll_progress and it returned Poll::Ready(Err(SomeError)), then when I next call poll_frame, I'll get either Poll::Ready(Some(Err(SomeError))) or Poll::Ready(None) back? Or could I call poll_progress, get an error, and then get Poll::Ready(Some(Ok(data))), or a different error to the one poll_progress gave me, from poll_frame?

@sfackler
Copy link
Author

The concern I have with calling this method poll_progress is that you're not asking the source to make progress; rather, you're asking it if poll_frame is definitely going to return Poll::Ready(Err(_)) on some future call.

That's not quite accurate - one kind of "progress" could be checking for a pending error, but it could also be used to do some amount of work in a pipelined fashion: #90 (comment).

Separately, is there a guarantee that if poll_progress returns an error, it will always return Poll::Ready(Err(_))?

I think that'd be unspecified in the same way it is for e.g. fallible streams. The expectation is that if it returns an error you're going to propagate that and stop polling the body.

Similarly, is it guaranteed that if I called poll_progress and it returned Poll::Ready(Err(SomeError)), then when I next call poll_frame, I'll get either Poll::Ready(Some(Err(SomeError))) or Poll::Ready(None) back?

Like above, that seems like it'd be constraining the implementation in a direction that shouldn't really matter in practice.

@farnz
Copy link

farnz commented Apr 25, 2025

The concern I have with calling this method poll_progress is that you're not asking the source to make progress; rather, you're asking it if poll_frame is definitely going to return Poll::Ready(Err(_)) on some future call.

That's not quite accurate - one kind of "progress" could be checking for a pending error, but it could also be used to do some amount of work in a pipelined fashion: #90 (comment).

This then feels like you're tying two separate things into one function:

  1. Tell me if the next poll_frame will fail.
  2. Do some work, so that the next poll_frame is quicker to return.

And I'm not sure that I like having both of those bundled together into a single function; I'm on a single-core Cortex-A8 platform, and often short on CPU. Being able to abort a processing pipeline early because the thing POSTing to me isn't going to finish sending chunks of data is useful, because it saves CPU, but bringing work forward is not, since I've got other uses for my CPU core if that chunk's not needed right now.

Separately, is there a guarantee that if poll_progress returns an error, it will always return Poll::Ready(Err(_))?

I think that'd be unspecified in the same way it is for e.g. fallible streams. The expectation is that if it returns an error you're going to propagate that and stop polling the body.

Like above, that seems like it'd be constraining the implementation in a direction that shouldn't really matter in practice.

For these, I mostly want the guarantees, or lack thereof, to be very clearly documented, the way unfused streams are very clearly documented, so that there's documentation to point to if someone write a TOCTOU mistake. Something like:

"""
You must treat poll_progress returning an error as equivalent to poll_frame returning an error (and vice-versa). The error is not guaranteed (but is allowed) to persist between calls, and thus it is expected that with some implementations, a call to poll_progress will return an error, but the next poll_frame does not, or a call to poll_frame will return an error, but the next call to poll_progress does not.

Further, note that both poll_frame and poll_progress have unspecified behaviour if called after poll_frame returns Poll::Ready(None); they can panic, block forever, or cause other kinds of problems; the Body trait places no requirements on the effects of such a call. However, as neither the poll_progress nor the poll_frame methods are marked unsafe, Rust’s usual rules apply: calls must never cause undefined behaviour (memory corruption, incorrect use of unsafe functions, or the like), regardless of the Body’s state.
"""

Basically, enough text that someone who's making bad assumptions about what they can expect of an implementation gets an up-front warning that they're holding it wrong.

@kytans
Copy link

kytans commented Apr 27, 2025

Giving feedback based on https://seanmonstar.com/blog/body-poll-progress/, this change makes no sense at all.

The operation conceptually copies data from a source to a sink until an error or EOF on the source happens. Hence, once you read some data from the source, then you MUST wait for the sink and write it to the sink, otherwise you are incorrectly losing data. Whether the source will return an error after returning data has no bearing whatsoever on the fact that that data MUST be sent to the sink, and thus you must wait for sink.

There are however there reasonable proper ways to change the operation:

  1. You can change the operation so that no data is sent to the sink on error. This requires to buffer the whole data in memory or disk and then send it all to the sink only once you are sure there is no error

  2. You can change the sink so that writes time out after some time, by wrapping it into a suitable adapter that adds the timeout

  3. You can add a way to cancel the whole operation, so that it can be immediately stopped at any arbitrary moment. The proper way to do this is to add a some sort of "cancellation token" parameter. It must not be a function on the source since this is completely unrelated to the source.

It looks like the change you are looking for is the third one.

Hence, you should not add this absurd interface and instead add a cancellation token to the source-to-sink copy operation and change the code so that both await operations terminate if the token is signalled.

To reiterate, this sort of cancellation has nothing whatsoever to do with the data source, and should thus not be a method on the data source, but rather a cancellation token parameter passed to the copy operation.

@sfackler
Copy link
Author

sfackler commented Apr 27, 2025

You can add a way to cancel the whole operation, so that it can be immediately stopped at any arbitrary moment. The proper way to do this is to add a some sort of "cancellation token" parameter. It must not be a function on the source since this is completely unrelated to the source.

What would the TimeoutBody implementation look like with that approach? #90 (comment)

I'm not sure how that proposal would change any of the MUSTs you're mentioning above. It's just a different API to perform the same thing (cancelling the body write).

@kytans
Copy link

kytans commented Apr 27, 2025

Assuming the goal of TimeoutBody is to cancel returning an HTTP response to the client after a certain amount of time elapsed regardless of anything else, there would be no TimeoutBody, since such a behavior has nothing to do with the response body.

Instead, it should be configured globally for the HTTP server, or if per-route configuration is desired, then the route handler should return a structure containing the body and per-route server response configuration parameters such as a timeout (or perhaps a more general abstract "responder" that would take care of the operation of sending the response).

Obviously however using a fixed timeout, especially if on by default, seems a really poor design since it will make it impossible to return large responses that could take hours or days to download. Instead, a more reasonable design would be to either have an timeout on each write, or a required minimum transfer bytes/second rate and a time interval to start enforcing it after.

@sfackler
Copy link
Author

Assuming the goal of TimeoutBody is to cancel returning an HTTP response to the client after a certain amount of time elapsed regardless of anything else, there would be no TimeoutBody, since such a behavior has nothing to do with the response body.

That is not the goal of the TimeoutBody.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.