forked from cline/cline
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.Someone is actively working on this. Should link to a PR soon.bugSomething isn't workingSomething isn't working
Description
App Version
3.25.17
API Provider
OpenAI Compatible
Model Used
N/A
Roo Code Task Links (Optional)
When configuring RooCode with an OpenAI Compatible Provider pointing at a LocalLLM (LiteLLM → vLLM stack), the behavior of the “Use custom temperature” option in the Provider → Advanced settings does not match expectations.
- If the box is unchecked, all requests are sent with temperature: 0.0 (greedy decoding).
- If the box is checked, RooCode uses the UI-selected value (e.g. 0.7).
This effectively means that leaving the box unchecked forces temperature=0.0, instead of passing through the model/provider’s default temperature (e.g. LiteLLM’s configured defaults).
🔁 Steps to Reproduce
- Set up RooCode with:
- Provider: OpenAI Compatible
- LocalLLM URL: pointing to LiteLLM, which forwards to vLLM with Qwen3-Coder.
- LiteLLM alias configured with a default temperature: 0.3.
- In RooCode, leave “Use custom temperature” unchecked.
- Send a request.
- Observe vLLM logs show temperature=0.0.
vllm logs:
params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=18577, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None.
- In RooCode, check “Use custom temperature” and set slider to 0.7.
- Send a request.
- Observe vLLM logs show temperature=0.7.
vllm logs:
params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.7, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=17875, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None.
- Send a request directly to LiteLLM with curl (bypassing RooCode).
- Observe vLLM logs show the correct default temperature=0.3.
❯ curl https://ai.stackq.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-xyz" -d '{
"model":"coder_api",
"messages":[{"role":"user","content":"Say hi"}]
}'
vllm logs:
params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.3, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=32758, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None.
💥 Outcome Summary
Expected Behavior
- When “Use custom temperature” is unchecked, RooCode should not send any temperature field in the request payload.
- This would allow the backend (LiteLLM, vLLM, or model defaults) to control temperature.
Actual Behavior
- RooCode always sends temperature: 0.0 when the box is unchecked, which overrides backend defaults and forces greedy decoding.
Impact
- Causes confusion when backend defaults (LiteLLM/vLLM) are configured, since RooCode silently overrides them with 0.0.
- Results in responses that are more deterministic and often brittle compared to expected backend behavior.
- Users who expect model default sampling settings never see them unless they manually enable “Use custom temperature.”
Proposed Fix
- When “Use custom temperature” is unchecked, RooCode should omit the temperature parameter in API requests.
- Only include temperature when the option is explicitly checked and set by the user.
📄 Relevant Logs or Errors (Optional)
.env:
MODEL="cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit"
MODEL_ALIAS="coder"
MODEL_CONTEXT_SIZE="32K"
MODEL_MAX_REQUESTS="2"
MODEL_MAX_TOKENS="3K"
MODEL_KVCACHE_TYPE="auto"
MODEL_GPU_UTIL="0.94"
MODEL_TOOL_TYPE="qwen3_coder"
litellm config:
model_list:
- model_name: ${MODEL_ALIAS}
litellm_params:
model: openai/${MODEL}
api_base: http://vllm:8000/v1
api_key: dummy
- model_name: ${MODEL_ALIAS}_api
litellm_params:
model: openai/${MODEL}
api_base: http://vllm:8000/v1
api_key: dummy
temperature: 0.3
top_p: 0.95
repetition_penalty: 1.05
general_settings:
user_header_name: X-OpenWebUI-User-Email
vllm startup:
--model ${MODEL}
--host 0.0.0.0
--port 8000
--download-dir /models
--gpu-memory-utilization ${MODEL_GPU_UTIL}
--tensor-parallel-size 1
--dtype float16
--kv-cache-dtype ${MODEL_KVCACHE_TYPE}
--enable-chunked-prefill
--max-model-len ${MODEL_CONTEXT_SIZE}
--max-num-seqs ${MODEL_MAX_REQUESTS}
--max-num-batched-tokens ${MODEL_MAX_TOKENS}
--trust-remote-code
--disable-uvicorn-access-log
--uvicorn-log-level warning
--enable-auto-tool-choice
--tool-call-parser ${MODEL_TOOL_TYPE}
Metadata
Metadata
Assignees
Labels
Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.Someone is actively working on this. Should link to a PR soon.bugSomething isn't workingSomething isn't working
Type
Projects
Status
Done