-
|
Just now, my download of Jundot Qwen3.5-35B-A3B-oQ4e failed. I downloaded 15GB(15GB/21GB) and was able to complete the download very quickly. Suddenly, an error occurred and the file couldn't be found and was deleted |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Beta Was this translation helpful? Give feedback.
-
|
Sorry about the disruption. I found an issue with the GPTQ method used in oQ enhanced quantization that was causing degraded quality on MoE models. All affected models have been taken down and will be re-uploaded with the fix. The corrected quantization will ship in oMLX v0.3.1. I have already uploaded the updated Qwen3.5-35B-A3B-oQ4e model for testing. Benchmark comparison (Qwen3.5-35B-A3B)
The old oQ4e was actually worse than uniform 4-bit on WINOGRANDE (73.3% vs 75.1%). The new version now beats uniform 4-bit across all four benchmarks. Remaining models will be re-quantized and uploaded over the next few days. |
Beta Was this translation helpful? Give feedback.
-
|
can you share step-by-step procedure of how you created the oO model using omlx/oO feature? e.g., what is the input model and how to converted? I'd like to try it out with my specific mac mini hardware and reduce the final model size even more, thanks |
Beta Was this translation helpful? Give feedback.
-
|
Aha, I was trying to compare my quantized Qwen3.5-35B-A3B-oQ4e model with the Brooooooklyn/Qwen3.5-35B-A3B-UD-Q4_K_XL-mlx model today when I saw this discussion. Luckily, I hadn't tweeted yet! Otherwise, it might have caused a misunderstanding. 🤣 |
Beta Was this translation helpful? Give feedback.
-
|
ok, i finally understand the oO quantization feature, and attempted to downsize the Qwen/Qwen3.5-9B source model to 2bit as an experiment. I used an existing Qwen3.5-4B-MLX-4bit model as the sensitivity model in the process, turned on advanced 'enhanced' flag on the admin UI page. It did eventually create a Qwen3.5-9B-oQ2e model (3.81GB in size). I loaded it in oMLX server and used opencode to connect to it, but it only outputs endless !!!!!! chars, what could be the problem? e.g., Prompt: hello When choose other models like Qwen3.5-4B-MLX-4bit, opencode works fine. oMLX version 0.3.0, appreciate any help, thanks |
Beta Was this translation helpful? Give feedback.
-
Hope this information helps! :D |
Beta Was this translation helpful? Give feedback.
-
|
Created Qwen3.5-4B-oQ3 that works! test in bash on mac mini m4 16gb: TG: 24tok/s+ (consistently with a lot of other apps running) |
Beta Was this translation helpful? Give feedback.
Sorry about the disruption. I found an issue with the GPTQ method used in oQ enhanced quantization that was causing degraded quality on MoE models. All affected models have been taken down and will be re-uploaded with the fix.
The corrected quantization will ship in oMLX v0.3.1. I have already uploaded the updated Qwen3.5-35B-A3B-oQ4e model for testing.
Benchmark comparison (Qwen3.5-35B-A3B)
The old oQ4e was actually worse than uniform 4-bit on WINOGRANDE (73.3% vs 75.1%). The new version now beats uniform 4-bit across all fou…