-
Operating system: win10 x64, wsl2 Roughly speaking, this test script selects num-prompts statements with a length of no more than 1024 tokens from ShareGPT_V3_unfiltered_cleaned_split.json, and asks the API to continue writing text of the same length. Results:
As shown in the above three tests, the request processing speed of sglang does not exceed 2.5request/s, while vllm is around 3.5. There is also a big gap in output token throughput. I adjusted the max-prefill-tokens parameter and re-ran sglang, but the situation did not change significantly. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Using the sglang benchmark program, the results are even worse. |
Beta Was this translation helpful? Give feedback.
-
2080Ti is sm75. We haven't tested or optimized it on this, and recommend using data center level devices such as the A100, H100. |
Beta Was this translation helpful? Give feedback.
2080Ti is sm75. We haven't tested or optimized it on this, and recommend using data center level devices such as the A100, H100.