huggingface Jundot/models deleted? #497

lete114 · 2026-03-31T09:46:28Z

lete114
Mar 31, 2026

Just now, my download of Jundot Qwen3.5-35B-A3B-oQ4e failed. I downloaded 15GB(15GB/21GB) and was able to complete the download very quickly. Suddenly, an error occurred and the file couldn't be found and was deleted
Are there problems with all the models? Who can tell me what happened?

Answered by jundot

Mar 31, 2026

Sorry about the disruption. I found an issue with the GPTQ method used in oQ enhanced quantization that was causing degraded quality on MoE models. All affected models have been taken down and will be re-uploaded with the fix.

The corrected quantization will ship in oMLX v0.3.1. I have already uploaded the updated Qwen3.5-35B-A3B-oQ4e model for testing.

Benchmark comparison (Qwen3.5-35B-A3B)

Benchmark	uniform 4-bit	oQ4e (old)	oQ4e (new)
MMLU	81.1%	81.3%	82.8%
WINOGRANDE	75.1%	73.3%	76.4%
HUMANEVAL	89.0%	87.8%	89.6%
MBPP	74.0%	74.0%	76.0%

The old oQ4e was actually worse than uniform 4-bit on WINOGRANDE (73.3% vs 75.1%). The new version now beats uniform 4-bit across all fou…

View full answer

lete114 · 2026-03-31T09:47:29Z

lete114
Mar 31, 2026
Author

https://huggingface.co/Jundot/models

0 replies

jundot · 2026-03-31T10:28:53Z

jundot
Mar 31, 2026
Maintainer

Sorry about the disruption. I found an issue with the GPTQ method used in oQ enhanced quantization that was causing degraded quality on MoE models. All affected models have been taken down and will be re-uploaded with the fix.

The corrected quantization will ship in oMLX v0.3.1. I have already uploaded the updated Qwen3.5-35B-A3B-oQ4e model for testing.

Benchmark comparison (Qwen3.5-35B-A3B)

Benchmark	uniform 4-bit	oQ4e (old)	oQ4e (new)
MMLU	81.1%	81.3%	82.8%
WINOGRANDE	75.1%	73.3%	76.4%
HUMANEVAL	89.0%	87.8%	89.6%
MBPP	74.0%	74.0%	76.0%

The old oQ4e was actually worse than uniform 4-bit on WINOGRANDE (73.3% vs 75.1%). The new version now beats uniform 4-bit across all four benchmarks.

Remaining models will be re-quantized and uploaded over the next few days.

0 replies

tty168 · 2026-04-01T22:15:07Z

tty168
Apr 1, 2026

can you share step-by-step procedure of how you created the oO model using omlx/oO feature? e.g., what is the input model and how to converted? I'd like to try it out with my specific mac mini hardware and reduce the final model size even more, thanks

0 replies

cryptokeenz · 2026-04-02T08:40:21Z

cryptokeenz
Apr 2, 2026

Aha, I was trying to compare my quantized Qwen3.5-35B-A3B-oQ4e model with the Brooooooklyn/Qwen3.5-35B-A3B-UD-Q4_K_XL-mlx model today when I saw this discussion. Luckily, I hadn't tweeted yet! Otherwise, it might have caused a misunderstanding. 🤣

0 replies

tty168 · 2026-04-02T16:28:13Z

tty168
Apr 2, 2026

ok, i finally understand the oO quantization feature, and attempted to downsize the Qwen/Qwen3.5-9B source model to 2bit as an experiment. I used an existing Qwen3.5-4B-MLX-4bit model as the sensitivity model in the process, turned on advanced 'enhanced' flag on the admin UI page. It did eventually create a Qwen3.5-9B-oQ2e model (3.81GB in size). I loaded it in oMLX server and used opencode to connect to it, but it only outputs endless !!!!!! chars, what could be the problem? e.g.,

Prompt: hello
Thinking: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (it goes on forever, same with any other prompts)

When choose other models like Qwen3.5-4B-MLX-4bit, opencode works fine.

oMLX version 0.3.0, appreciate any help, thanks

0 replies

cryptokeenz · 2026-04-02T22:04:24Z

cryptokeenz
Apr 2, 2026

Judging from this commit, there might still be some issues with oQe, so the author has temporarily disabled oQe quantization in the newly released version: d022927
In theory, the smaller the model, the more likely it is to output gibberish when quantized to 2-bit. You can try 3-bit first to see if it works normally. Once confirmed there are no issues, you can update to the latest version and attempt 2-bit quantization again.

Hope this information helps! :D

0 replies

tty168 · 2026-04-08T15:58:25Z

tty168
Apr 8, 2026

Created Qwen3.5-4B-oQ3 that works!
Source: Qwen3.5-4B, Sensitivity: Qwen3.5-4B-MLX-4bit
oMLX v 0.3.4
Published at https://huggingface.co/tty168/Qwen3.5-4B-oQ3
HF size 1.97GB

test in bash on mac mini m4 16gb:
opencode run "summarize top news of the current US" --model omlx/Qwen3.5-4B-oQ3 --thinking true

TG: 24tok/s+ (consistently with a lot of other apps running)

0 replies

huggingface Jundot/models deleted? #497

Uh oh!

lete114 Mar 31, 2026

Benchmark comparison (Qwen3.5-35B-A3B)

Replies: 7 comments

Uh oh!

lete114 Mar 31, 2026 Author

Uh oh!

Uh oh!

jundot Mar 31, 2026 Maintainer

Benchmark comparison (Qwen3.5-35B-A3B)

Uh oh!

tty168 Apr 1, 2026

Uh oh!

Uh oh!

cryptokeenz Apr 2, 2026

Uh oh!

Uh oh!

tty168 Apr 2, 2026

Uh oh!

Uh oh!

cryptokeenz Apr 2, 2026

Uh oh!

tty168 Apr 8, 2026

lete114
Mar 31, 2026

lete114
Mar 31, 2026
Author

jundot
Mar 31, 2026
Maintainer

tty168
Apr 1, 2026

cryptokeenz
Apr 2, 2026

tty168
Apr 2, 2026

cryptokeenz
Apr 2, 2026

tty168
Apr 8, 2026