Skip to content

Misc. bug: Web-Frontend overrides sampling parameters from models preset in router mode #18129

@rtpt-erikgeiser

Description

@rtpt-erikgeiser

Name and Version

$ llama-server --version
[...]
version: 7435 (79dbae034)
built with GNU 14.2.0 for Linux x86_64

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server --host 127.0.0.1 --port 8000 --jinja --prio 2 --no-mmap --threads -1 -np 3 -fa on --models-preset models.ini

With the following models.ini:


version = 1

[Qwen3-Next-80B-A3A-Thinking-Q4KXL]
model = /path/to/Qwen3-Next-80B-A3B-Thinking-UD-Q4_K_XL.gguf
c = 86016
temp = 0.6
top-p = 0.95
min-p = 0
top-k = 20
cpu-moe = true
presence-penalty = 1.0

[GPT-OSS-20B-F16]
model = /path/to/gpt-oss-20b-F16.gguf
c = 98304
temp = 1.0
top-p = 1.00
min-p = 0
top-k = 100

Problem description & steps to reproduce

I'm running llama-server in router mode with a model preset file containing the recommended sampling parameters for each model. However, the web frontend has different default sampling parameters that are not updated when I switch to a model. While the model's process does use the sampling parameters (as arguments) configured in the preset file, they get overwritten by the web frontend's default sampling parameters in the corresponding /v1/chat/completions request. Clicking "Reset to default" in the web frontend's sampling parameter settings also does not apply the current model's sampling parameters.

The request to /v1/chat/completions sends the following sampling parameters:

temperature: 0.8
top-k: 40
top-p: 0.95
min-p: 0.05

First Bad Commit

This is the first time that I use the model preset feature, so I assume it always behaved like this since the feature was just merged.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions