Skip to content

Temperature setting not applied correctly when using OpenAI Compatible Provider with LocalLLM URL #7187

@christopherowen

Description

@christopherowen

App Version

3.25.17

API Provider

OpenAI Compatible

Model Used

N/A

Roo Code Task Links (Optional)

When configuring RooCode with an OpenAI Compatible Provider pointing at a LocalLLM (LiteLLM → vLLM stack), the behavior of the “Use custom temperature” option in the Provider → Advanced settings does not match expectations.

  • If the box is unchecked, all requests are sent with temperature: 0.0 (greedy decoding).
  • If the box is checked, RooCode uses the UI-selected value (e.g. 0.7).

This effectively means that leaving the box unchecked forces temperature=0.0, instead of passing through the model/provider’s default temperature (e.g. LiteLLM’s configured defaults).

🔁 Steps to Reproduce

  1. Set up RooCode with:
  • Provider: OpenAI Compatible
  • LocalLLM URL: pointing to LiteLLM, which forwards to vLLM with Qwen3-Coder.
  • LiteLLM alias configured with a default temperature: 0.3.
  1. In RooCode, leave “Use custom temperature” unchecked.
  • Send a request.
  • Observe vLLM logs show temperature=0.0.
    vllm logs:
params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=18577, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None.
  1. In RooCode, check “Use custom temperature” and set slider to 0.7.
  • Send a request.
  • Observe vLLM logs show temperature=0.7.
    vllm logs:
params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.7, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=17875, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None.
  1. Send a request directly to LiteLLM with curl (bypassing RooCode).
  • Observe vLLM logs show the correct default temperature=0.3.
❯ curl https://ai.stackq.com/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer sk-xyz"   -d '{
    "model":"coder_api",
    "messages":[{"role":"user","content":"Say hi"}]
  }'

vllm logs:

params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.3, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=32758, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None.

💥 Outcome Summary

Expected Behavior

  • When “Use custom temperature” is unchecked, RooCode should not send any temperature field in the request payload.
  • This would allow the backend (LiteLLM, vLLM, or model defaults) to control temperature.

Actual Behavior

  • RooCode always sends temperature: 0.0 when the box is unchecked, which overrides backend defaults and forces greedy decoding.

Impact

  • Causes confusion when backend defaults (LiteLLM/vLLM) are configured, since RooCode silently overrides them with 0.0.
  • Results in responses that are more deterministic and often brittle compared to expected backend behavior.
  • Users who expect model default sampling settings never see them unless they manually enable “Use custom temperature.”

Proposed Fix

  • When “Use custom temperature” is unchecked, RooCode should omit the temperature parameter in API requests.
  • Only include temperature when the option is explicitly checked and set by the user.

📄 Relevant Logs or Errors (Optional)

.env:

MODEL="cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit"
MODEL_ALIAS="coder"
MODEL_CONTEXT_SIZE="32K"
MODEL_MAX_REQUESTS="2"
MODEL_MAX_TOKENS="3K"
MODEL_KVCACHE_TYPE="auto"
MODEL_GPU_UTIL="0.94"
MODEL_TOOL_TYPE="qwen3_coder"


litellm config:

model_list:
  - model_name: ${MODEL_ALIAS}
    litellm_params:
      model: openai/${MODEL}
      api_base: http://vllm:8000/v1
      api_key: dummy

  - model_name: ${MODEL_ALIAS}_api
    litellm_params:
      model: openai/${MODEL}
      api_base: http://vllm:8000/v1
      api_key: dummy
      temperature: 0.3
      top_p: 0.95
      repetition_penalty: 1.05

general_settings:
  user_header_name: X-OpenWebUI-User-Email


vllm startup:

      --model ${MODEL}
      --host 0.0.0.0
      --port 8000
      --download-dir /models
      --gpu-memory-utilization ${MODEL_GPU_UTIL}
      --tensor-parallel-size 1
      --dtype float16
      --kv-cache-dtype ${MODEL_KVCACHE_TYPE}
      --enable-chunked-prefill
      --max-model-len ${MODEL_CONTEXT_SIZE}
      --max-num-seqs ${MODEL_MAX_REQUESTS}
      --max-num-batched-tokens ${MODEL_MAX_TOKENS}
      --trust-remote-code
      --disable-uvicorn-access-log
      --uvicorn-log-level warning
      --enable-auto-tool-choice
      --tool-call-parser ${MODEL_TOOL_TYPE}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions