Skip to content
Discussion options

You must be logged in to vote

Sorry about the disruption. I found an issue with the GPTQ method used in oQ enhanced quantization that was causing degraded quality on MoE models. All affected models have been taken down and will be re-uploaded with the fix.

The corrected quantization will ship in oMLX v0.3.1. I have already uploaded the updated Qwen3.5-35B-A3B-oQ4e model for testing.

Benchmark comparison (Qwen3.5-35B-A3B)

Benchmark uniform 4-bit oQ4e (old) oQ4e (new)
MMLU 81.1% 81.3% 82.8%
WINOGRANDE 75.1% 73.3% 76.4%
HUMANEVAL 89.0% 87.8% 89.6%
MBPP 74.0% 74.0% 76.0%

The old oQ4e was actually worse than uniform 4-bit on WINOGRANDE (73.3% vs 75.1%). The new version now beats uniform 4-bit across all fou…

Replies: 7 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by lete114
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants