-
Notifications
You must be signed in to change notification settings - Fork 12.6k
convert : support non-mxfp4 HF model #15153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert : support non-mxfp4 HF model #15153
Conversation
Got this error:
Tried with this script:
|
I deleted this section:
The script ran without it, I'm uploading here and testing it |
looking good
|
Trying to quantize down to MXFP4 prints out a ton of stuff and then fails
|
Tried your solution @gabriellarson but it only seems to produce a 2GB file using q8_0, so I think there's an issue somewhere. |
@gabriellarson thanks for testing, please retry to see if |
// TODO: temporary sanity check that the F16 -> MXFP4 is lossless | ||
#if 1 | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For vis @ggerganov , I disable this check because most users will now using this code branch to convert fine-tuned model to MXFP4, which will no longer be lossless.
Although, I'm a bit doubt if fine-tuned models like the abliterated version should be quantize to something other than MXFP4 or not
@gabriellarson Could you also try converting it to Q4_K_M to see if it impacts the quality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind, it's not possible to quantize to Q4_K since the tensor shape is not divisible by 256
Quantizing works now Q4_K_M and MXFP4 both create decent output, Q4_K_M has lower perplexity llama.cpp/build/bin/llama-perplexity -m model.gguf -f wikitext-2-raw/wiki.test.raw -ngl 99 MXFP4:
Q4_K_M
|
Yes that's expected, because big FFN tensors cannot be quantized to anything other than Q8_0 or MXFP4. For the Q4_K_M, these tensors are fallback to Q8_0 |
I guess we need a new quantization scheme of "Q4_K_FX" or something that uses MXFP4 as the fallback. |
The goal is to fix conversion for https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
This partially reverts e2c1beb
Closes #15146