Skip to content

Slow reading of small chunks #2135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dneuhaeuser-zalando opened this issue Feb 19, 2020 · 9 comments
Open

Slow reading of small chunks #2135

dneuhaeuser-zalando opened this issue Feb 19, 2020 · 9 comments
Labels
A-http1 Area: HTTP/1 specific. C-performance Category: performance. This is making existing behavior go faster.

Comments

@dneuhaeuser-zalando
Copy link

While dealing with a performance library in requests (yes, the Python library) I stumbled across psf/requests#2371. I wanted to quickly evaluate whether switching to Rust would make sense for my problem so I created a small benchmark in Rust, matching the ones in https://github.com/alex/http-client-bench:


#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let client = Client::new();
    let uri = "http://localhost:8080".parse()?;
    let mut resp = client.get(uri).await?;
    let mut handle = stdout();
    while let Some(chunk) = resp.body_mut().data().await {
        handle.write_all(&chunk?).await?;
    }
    Ok(())
}

To my surprise this is a lot slower than the Python clients on my machine:

$ ./run.sh 
Python 3.7.4
go version go1.13.8 darwin/amd64
BENCH HTTPLIB:
8.00GiB 0:00:25 [ 317MiB/s] [================================>] 100%            
BENCH URLLIB3:
8.00GiB 0:00:33 [ 244MiB/s] [================================>] 100%            
BENCH REQUESTS
8.00GiB 0:00:36 [ 222MiB/s] [================================>] 100%            
BENCH GO HTTP
8.00GiB 0:00:23 [ 351MiB/s] [================================>] 100%            
signal: broken pipe
BENCH RUST HYPER
8.00GiB 0:00:59 [ 136MiB/s] [================================>] 100%            
Error: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

I've build it with --release, timings vary between runs but it's always the same ballpark. I thought at first that maybe it's just writing to stdout that's slow but even if I modify the benchmark to remove writing to stdout it gets faster but remains slower than the competition:

extern crate hyper;
use hyper::Client;
use hyper::body::HttpBody as _;
// use tokio::io::{stdout, AsyncWriteExt as _};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let client = Client::new();
    let uri = "http://localhost:8080".parse()?;
    let mut resp = client.get(uri).await?;
//     let mut handle = stdout();
    let bench_size: usize = 8 * 1024 * 1024 * 1024;
    let mut bytes_seen: usize = 0;
    while let Some(chunk) = resp.body_mut().data().await {
        bytes_seen += chunk?.len();
        if bytes_seen >= bench_size { break; }
//        handle.write_all(&chunk?).await?;
    }
    Ok(())
}
$ time ./target/release/http-client-bench 

real	0m47.529s
user	0m31.867s
sys	0m34.785s

I've tried search the documentation for any configuration changes I might be able to make but nothing looked relevant as far as I could see. So... what's going on?

@seanmonstar
Copy link
Member

What versions are you using (specifically of hyper and tokio)?

@dneuhaeuser-zalando
Copy link
Author

dneuhaeuser-zalando commented Feb 19, 2020

hyper 0.13.2 and tokio 0.2.11

@seanmonstar
Copy link
Member

Hm, interesting! I assume the server is just sending a never-ending chunked stream? What size are the chunks?

@dneuhaeuser-zalando
Copy link
Author

@seanmonstar
Copy link
Member

Ah ok, so my guess is that slow down is due to the size of the chunks. hyper is yielding each chunk of 1kb, even though it has read multiple from the socket. I suspect if you were to increase the chunk size of the server to something bigger like 8 or 16kb (or even bigger), the performance should even out.

@seanmonstar seanmonstar changed the title Slow parsing of large responses Slow reading of small chunks Feb 19, 2020
@dneuhaeuser-zalando
Copy link
Author

You're right. Hyper does perform a lot better, if the server sends larger chunks. Other clients also improve but Hyper does catch up to them.

8kb

$ ./run.sh 
Python 3.7.4
go version go1.13.8 darwin/amd64
BENCH HTTPLIB:
8.00GiB 0:00:14 [ 561MiB/s] [===============================================>] 100%            
BENCH URLLIB3:
8.00GiB 0:00:20 [ 392MiB/s] [===============================================>] 100%            
BENCH REQUESTS
8.00GiB 0:00:19 [ 414MiB/s] [===============================================>] 100%            
BENCH GO HTTP
8.00GiB 0:00:09 [ 869MiB/s] [===============================================>] 100%            
signal: broken pipe
BENCH RUST HYPER
8.00GiB 0:00:18 [ 443MiB/s] [===============================================>] 100%            
Error: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

16kb

$ ./run.sh 
Python 3.7.4
go version go1.13.8 darwin/amd64
BENCH HTTPLIB:
8.00GiB 0:00:11 [ 728MiB/s] [===============================================>] 100%            
BENCH URLLIB3:
8.00GiB 0:00:16 [ 484MiB/s] [===============================================>] 100%            
BENCH REQUESTS
8.00GiB 0:00:09 [ 837MiB/s] [===============================================>] 100%            
BENCH GO HTTP
8.00GiB 0:00:08 [ 919MiB/s] [===============================================>] 100%            
signal: broken pipe
BENCH RUST HYPER
8.00GiB 0:00:10 [ 787MiB/s] [===============================================>] 100%            
Error: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

32kb

$ ./run.sh 
Python 3.7.4
go version go1.13.8 darwin/amd64
BENCH HTTPLIB:
8.00GiB 0:00:09 [ 873MiB/s] [===============================================>] 100%            
BENCH URLLIB3:
8.00GiB 0:00:14 [ 549MiB/s] [===============================================>] 100%            
BENCH REQUESTS
8.00GiB 0:00:08 [ 915MiB/s] [===============================================>] 100%            
BENCH GO HTTP
8.00GiB 0:00:05 [1.48GiB/s] [===============================================>] 100%            
signal: broken pipe
BENCH RUST HYPER
8.00GiB 0:00:09 [ 859MiB/s] [===============================================>] 100%            
Error: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

64kb

$ ./run.sh 
Python 3.7.4
go version go1.13.8 darwin/amd64
BENCH HTTPLIB:
8.00GiB 0:00:08 [1005MiB/s] [===============================================>] 100%            
BENCH URLLIB3:
8.00GiB 0:00:13 [ 615MiB/s] [===============================================>] 100%            
BENCH REQUESTS
8.00GiB 0:00:08 [ 981MiB/s] [===============================================>] 100%            
BENCH GO HTTP
8.00GiB 0:00:03 [2.12GiB/s] [===============================================>] 100%            
signal: broken pipe
BENCH RUST HYPER
8.00GiB 0:00:08 [ 927MiB/s] [===============================================>] 100%            
Error: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

@sfackler
Copy link
Contributor

Definitely worth profiling to find where the hot spots are, but I ran into something similar with tokio-postgres and found that it was much more efficient to send "macro blocks" of multiple messages through the channel from the connection to request future and have it split apart in the response stream itself: sfackler/rust-postgres#452

@seanmonstar
Copy link
Member

There may two steps here:

  1. Reduce the overhead of parsing a chunked header. This probably isn't the biggest factor, but worth checking.
  2. If we could already read and parse several chunks, it'd be great to yield them at once. We can't really do that at the moment since we yield Bytes, a contiguous buffer, but a possibility could be to change to yielding some impl Buf with the chunks grouped together.

Go will try to read multiple chunks at a time into the single user buffer, whereas Netty yields each chunk as well. So now I wonder if (1) is significant.

@seanmonstar seanmonstar added A-http1 Area: HTTP/1 specific. C-performance Category: performance. This is making existing behavior go faster. labels Feb 19, 2020
@tqwewe
Copy link

tqwewe commented May 13, 2021

Also experiencing this issue. I'm getting double the times of my python script for a simple get request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-http1 Area: HTTP/1 specific. C-performance Category: performance. This is making existing behavior go faster.
Projects
None yet
Development

No branches or pull requests

4 participants