How to reduce latency of litellm proxy #4298
Unanswered
ManivannanGuru
asked this question in
Q&A
Replies: 3 comments 2 replies
-
Hi @ManivannanGuru - when running our load tests we see a latency of |
Beta Was this translation helpful? Give feedback.
0 replies
-
I'm investigating HF TGI endpoints right now |
Beta Was this translation helpful? Give feedback.
0 replies
-
This is what I see with litellm
This takes 1.273 seconds |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
We have a chabot where customers can ask questions, behind the scene we use LLMs (llama3 in this case) hosted via TGI inference, with Java backend
As of now java code talks directly to the LLM (via TGI inference endpoints), but we are planning to implement LLM observability. So we were thinking of using LiteLLM proxy since it has integration with many observability tools like langfuse etc...
But the catch is the moment we introduce litellm in the picture we have a huge delay in the response time, for example direct API calls without litellm has a response time of ~2 seconds, but with litellm it is ~4.5 seconds.
We have tested just using litellm proxy (with & without postgres db), in both the cases issue remains the same.
FYI
litellm.proxy.com -> here we have hosted the litellm
models.yourdomain.com -> here we have hosted the LLMs
client(java) --> litellm --> llm : ~4.3s
Above API is sent from java client to litellm and we got the response in ~4.3s
client --> llm : ~1.6s
Above API call is sent from java client to LLM directly and we got the response in ~1.6s, so here you can see there is approx 3 seconds delay
We have also configured litellm in local machine and below are the results
local --> local litellm (but the models are hosted in the sever not in local) : ~4s
model_list:
litellm_params:
model: huggingface/llama3
api_base: https://models.yourdomain.com
LiteLLM: Version = 1.40.17
Would be really great if anyone can help us on this.
Beta Was this translation helpful? Give feedback.
All reactions