fix(llama): adds support for non-default head_dim by suniastar · Pull Request #3593 · huggingface/candle

suniastar · 2026-06-09T11:18:50Z

Adds support for non-standard/non-fallback head_dim values.

head_dim (int, optional) — The attention head dimension. If None, it will default to hidden_size // num_attention_heads

This PR adds support for the optional values as well as its gguf metadata counterparts

llama.attention.key_length
llama.attention.value_length

for the quantized llama model.

As my goal was to get MN Violet Lotus running with candle I was only able to test the quantized implemention due to lack of (V)RAM.

adds support for llamas head_dim config

7cd07d6

Provide feedback