You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the litellm client to benchmark a HuggingFace TGI server.
In token_benchmark_ray.py, req_launcher.get_next_ready() is called periodically to fetch pending results, with the block parameter set to False.
However, the call is actually blocking until all pending requests are complete, which can be very long if I set a high number of concurrent requests (typically 128).
The result is that instead of continuously injecting new requests as they complete, the benchmark script instead sends a batch of max_concurrent_requests, waits for them to complete, then sends another batch.
Is this the expected behaviour ? I double-checked why the call is blocking and from the code in request launcher this seems to be the normal behaviour because it only checks if there are still requests in the ray actor pool.
The text was updated successfully, but these errors were encountered:
I found that I have the same issue as you. (#56)
I think the get_next_ready function should return the result as soon as the request is finished in nonblock mode.
I am using the
litellm
client to benchmark a HuggingFace TGI server.In
token_benchmark_ray.py
,req_launcher.get_next_ready()
is called periodically to fetch pending results, with theblock
parameter set to False.However, the call is actually blocking until all pending requests are complete, which can be very long if I set a high number of concurrent requests (typically 128).
The result is that instead of continuously injecting new requests as they complete, the benchmark script instead sends a batch of max_concurrent_requests, waits for them to complete, then sends another batch.
Is this the expected behaviour ? I double-checked why the call is blocking and from the code in request launcher this seems to be the normal behaviour because it only checks if there are still requests in the ray actor pool.
The text was updated successfully, but these errors were encountered: