-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Name and Version
version: 6906 (0de0a01)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
HIP
Hardware
probably not relevant
Models
Qwen3VL
Problem description & steps to reproduce
Not sure if this is a bug or I am holding it wrong.
Apparently Qwen3VL uses n_embd strangely in llama.cpp. Computing a control vector as I do for other models produces a result with a size of 5120 per layer, but I cannot load it.
Although the hidden_size is 5120 in transformers, llama.cpp asserts unless the vector is 20480. Padding the remaining values, or commenting out the assert results in a different error, GGML_ASSERT(ggml_can_repeat(b, a)).
Reading the merge CL for Qwen3VL, it sounds like n_embd is extended to handle some recurrent stuff. Is a different approach to calculate control vectors necessary here? How should this work for Qwen3VL?
First Bad Commit
n/a
Relevant log output
apply: control vector n_embd does not match model
srv load_model: failed to load model, '.../path/to/model.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error