How to send custom params like "truncate_prompt_tokens" to vllm embedding #25077

RocketRider · 2026-04-03T12:34:31Z

RocketRider
Apr 3, 2026

I use litellm as proxy and want to send a custom param "truncate_prompt_tokens" to the hosted_vllm instance.
https://docs.vllm.ai/en/v0.6.4/dev/sampling_params.html

I tied many ways to configure it but it never gets send to vllm, is there any way to rend this param with all calls?

    - model_name: qwen3-embedding-8b
      litellm_params:
        model: hosted_vllm/qwen3-embedding-8b
        api_key: ****
        api_base: http://llm-qwen3-embedding-8b.llms.svc.cluster.local:8200/v1
        drop_params: false
        extra_body:
          truncate_prompt_tokens: 16000
      model_info:
        max_input_tokens: 16000
        mode: embedding
        input_cost_per_token: 6.00e-8

RocketRider · 2026-04-04T07:22:12Z

RocketRider
Apr 4, 2026
Author

I could not find any way to set it without changing the code, so I made a pull request: #25127

0 replies

rehan243 · 2026-04-12T08:43:48Z

rehan243
Apr 12, 2026

Hey, I’ve been digging into the LiteLLM proxy setup for a while now, and I can see why you’re hitting a wall with passing custom params like truncate_prompt_tokens to a hosted vLLM instance. From what I’ve seen in our own deployments, LiteLLM’s handling of extra_body or custom params can be a bit finicky when it’s not explicitly mapped to the downstream API’s expected structure. Looking at the vLLM docs (v0.6.4), truncate_prompt_tokens is a valid sampling param, but it seems LiteLLM might not be forwarding it correctly to the vLLM endpoint, especially for embedding models.

Can you confirm if you’ve tried setting this under litellm_params directly or nested it under a sampling_params key in extra_body? In our setups with similar proxy layers for LLMs, we’ve had to explicitly match the downstream API’s expected payload structure. Here’s a quick tweak you could try in your config to see if it bridges the gap:

litellm_params:
  model: hosted_vllm/qwen3-embedding-8b
  api_key: INTERNAL_OPENAI_API_88090
  api_base: http://llm-qwen3-embedding-8b.llms.svc.cluster.local:8200/v1
  drop_params: false
  extra_body:
    sampling_params:
      truncate_prompt_tokens: 16000

Also, if this doesn’t work, I’d suggest logging the outgoing request payload from LiteLLM to vLLM—there’s a debug mode in LiteLLM you can enable to inspect what’s actually being sent. We’ve used this approach when debugging param passthrough issues with custom endpoints handling over 500 concurrent requests. If the param still doesn’t make it through, it might be worth raising an issue on the LiteLLM repo with your logs, as it could be a gap in their vLLM integration for embedding models. Let me know how this pans out or if you’ve got other workarounds!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to send custom params like "truncate_prompt_tokens" to vllm embedding #25077

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to send custom params like "truncate_prompt_tokens" to vllm embedding #25077

Uh oh!

Uh oh!

RocketRider Apr 3, 2026

Replies: 2 comments

Uh oh!

RocketRider Apr 4, 2026 Author

Uh oh!

rehan243 Apr 12, 2026

RocketRider
Apr 3, 2026

RocketRider
Apr 4, 2026
Author

rehan243
Apr 12, 2026