Skip to content

fix(llama): adds support for non-default head_dim#3593

Open
suniastar wants to merge 1 commit into
huggingface:mainfrom
suniastar:feat/llama-non-standard-head-dim
Open

fix(llama): adds support for non-default head_dim#3593
suniastar wants to merge 1 commit into
huggingface:mainfrom
suniastar:feat/llama-non-standard-head-dim

Conversation

@suniastar

Copy link
Copy Markdown

Adds support for non-standard/non-fallback head_dim values.

on huggingface head_dim is defined as

head_dim (int, optional) — The attention head dimension. If None, it will default to hidden_size // num_attention_heads

This PR adds support for the optional values as well as its gguf metadata counterparts

llama.attention.key_length
llama.attention.value_length

for the quantized llama model.

As my goal was to get MN Violet Lotus running with candle I was only able to test the quantized implemention due to lack of (V)RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant