How to send custom params like "truncate_prompt_tokens" to vllm embedding #25077
Replies: 2 comments
-
|
I could not find any way to set it without changing the code, so I made a pull request: #25127 |
Beta Was this translation helpful? Give feedback.
-
|
Hey, I’ve been digging into the LiteLLM proxy setup for a while now, and I can see why you’re hitting a wall with passing custom params like Can you confirm if you’ve tried setting this under litellm_params:
model: hosted_vllm/qwen3-embedding-8b
api_key: INTERNAL_OPENAI_API_88090
api_base: http://llm-qwen3-embedding-8b.llms.svc.cluster.local:8200/v1
drop_params: false
extra_body:
sampling_params:
truncate_prompt_tokens: 16000Also, if this doesn’t work, I’d suggest logging the outgoing request payload from LiteLLM to vLLM—there’s a debug mode in LiteLLM you can enable to inspect what’s actually being sent. We’ve used this approach when debugging param passthrough issues with custom endpoints handling over 500 concurrent requests. If the param still doesn’t make it through, it might be worth raising an issue on the LiteLLM repo with your logs, as it could be a gap in their vLLM integration for embedding models. Let me know how this pans out or if you’ve got other workarounds! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I use litellm as proxy and want to send a custom param "truncate_prompt_tokens" to the hosted_vllm instance.
https://docs.vllm.ai/en/v0.6.4/dev/sampling_params.html
I tied many ways to configure it but it never gets send to vllm, is there any way to rend this param with all calls?
Beta Was this translation helpful? Give feedback.
All reactions