-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Description
Name and Version
$ llama-server --version
[...]
version: 7435 (79dbae034)
built with GNU 14.2.0 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --host 127.0.0.1 --port 8000 --jinja --prio 2 --no-mmap --threads -1 -np 3 -fa on --models-preset models.ini
With the following models.ini:
version = 1
[Qwen3-Next-80B-A3A-Thinking-Q4KXL]
model = /path/to/Qwen3-Next-80B-A3B-Thinking-UD-Q4_K_XL.gguf
c = 86016
temp = 0.6
top-p = 0.95
min-p = 0
top-k = 20
cpu-moe = true
presence-penalty = 1.0
[GPT-OSS-20B-F16]
model = /path/to/gpt-oss-20b-F16.gguf
c = 98304
temp = 1.0
top-p = 1.00
min-p = 0
top-k = 100Problem description & steps to reproduce
I'm running llama-server in router mode with a model preset file containing the recommended sampling parameters for each model. However, the web frontend has different default sampling parameters that are not updated when I switch to a model. While the model's process does use the sampling parameters (as arguments) configured in the preset file, they get overwritten by the web frontend's default sampling parameters in the corresponding /v1/chat/completions request. Clicking "Reset to default" in the web frontend's sampling parameter settings also does not apply the current model's sampling parameters.
The request to /v1/chat/completions sends the following sampling parameters:
temperature: 0.8
top-k: 40
top-p: 0.95
min-p: 0.05
First Bad Commit
This is the first time that I use the model preset feature, so I assume it always behaved like this since the feature was just merged.