Open
Description
Hi, below are results from a "Hello World" micro-benchmark comparing hyper 1.0 (52f1925) and uWebSocket both running on a current_thread event loop.
# uWebSocket
divy@mini ~> wrk -d 10s --latency http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 42.75us 69.88us 5.83ms 99.54%
Req/Sec 109.29k 6.66k 121.53k 92.08%
Latency Distribution
50% 38.00us
75% 50.00us
90% 61.00us
99% 89.00us
2197037 requests in 10.10s, 142.48MB read
Requests/sec: 217523.38
Transfer/sec: 14.11MB
# Hyper
divy@mini ~> wrk -d 10s --latency http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 90.14us 608.17us 21.18ms 99.25%
Req/Sec 86.52k 9.89k 102.72k 85.15%
Latency Distribution
50% 49.00us
75% 60.00us
90% 73.00us
99% 246.00us
1738450 requests in 10.10s, 145.90MB read
Requests/sec: 172106.55
Transfer/sec: 14.44MB
This does not represent real-world performance but also indicates some overhead in the Hyper server machinary.
Profile for the hyper run: https://share.firefox.dev/3NqxfKC
Profile for the uWebSocket run: https://share.firefox.dev/3pppj4p
I noticed that due to the API, Request is owned and headers have to be copied over to a HeaderMap and deallocated after every request. This can be seen in the above flamegraphs too. Do you think its possible to improve this?