-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Add basic support for MXFP6_MOE quantization #16777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This happens because you have not installed/inited |
Thanks for your advice. |
|
Please restore formatting to |
|
What is the motivation for this type? Are there any models being natively distributed in MXFP6, or does it perform better than other quantizations? |
probably Blackwell support |
Currently, there are no models natively distributed in MXFP6, but I think MXFP6 may offer a good balance between model quality and performance in the future :) NVIDIA's Blackwell architecture is expected to support MXFP6, and AMD MI355X also includes MXFP6 support. Additionally, while MXFP4 has shown promising results with QAT, some paper such as Table 2 and 3 in this papers reports that MXFP4 may not perform as well under direct quantization (which is one of wide use-cases of llama.cpp). In contrast, MXFP6 appears to be more robust in such settings. |
Make sure to read the contributing guidelines before submitting a PR
test-quantize-*passed in local CI.test-tokenizer-ggml-vocabsreports failure but I don't think it's caused by this PR: (as this pr does not change gguf parser)