-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Closed
Labels
Description
What happened?
I have OpenAI tier 5 usage, which should give me 30,000 RPM = 500 RPS with "gpt-4o-mini". However I struggle get past 50 RPS.
The minimal replication:
from litellm import acompletion
tasks = [acompletion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You're an agent who answers yes or no"},
{"role": "user", "content": "Is the sky blue?"},
],
) for i in range(2000)]
I only get 50 items/second as opposed to ~500 items/second when sending raw HTTP requests.
Relevant log output
16%|█████████████████████▌ | 320/2000 [00:09<00:40, 41.49it/s]
Twitter / LinkedIn details
No response
CharlieJCJ, RyanMarten and fmmoret