Sorry for the silence, catching up this week #731

jundot · 2026-04-12T08:18:54Z

jundot
Apr 12, 2026
Maintainer

Hey everyone, I owe you all an apology. I had personal things to take care of this weekend and couldn't find any time at all to look at issues or PRs. I know a lot of you have been waiting and I'm really sorry about that.

I'll start going through everything from tuesday night. It might take a few days to get to all of them, but I'll do my best to respond to each one. Thanks for your patience and sorry again for the delay.

Tagging recent PR contributors so you know I haven't forgotten about your work:
@jroth1111 @beantownbytes @kyr0 @thornad @Landon-Molt @applesauce49 @RepublicOfKorokke @jaredlockhart @0xClandestine @jnchaba @Chedrian07 @Bahtya @yizhang

jundot · 2026-04-12T08:20:00Z

jundot
Apr 12, 2026
Maintainer Author

If any of you have something critical or blocking, feel free to leave a comment here and I'll prioritize it.

1 reply

blightbow Apr 12, 2026

I have a request for you to take all the time you need in order to maintain your work-life balance.

Critical priority please, hop to it. 👉

Landon-Molt · 2026-04-12T17:59:42Z

Landon-Molt
Apr 12, 2026

First and foremost: Thank you for opening up this incredible project — you don't have to apologize for your work-life balance choices! You deserve all the credit, and of course, all the time you need. In this AI era, increased productivity should alleviate the pressure on the working class and give us more time and focus for our personal lives and families. Peak memory / TurboQuant: I'm currently working on some performance issues, specifically peak memory consumption during prefill with TurboQuant enabled. The root cause is upstream in mlx-vlm — prefill_attention() silently falls back to decompressing the entire KV cache to fp16 after PR #909 dropped ProdCodec. I've filed the finding upstream at Blaizzy/mlx-vlm#1016 and am working on a fused Metal kernel fix. This could end up being a tradeoff between memory and prefill speed — I'm trying to make it work without that compromise. SSD write volume: People have raised concerns about high SSD writes from the caching system. We all love this feature! However, at the current write rate (17 TB on my fresh MacBook Pro M5 Max in less than a month), I think at least a toggle should be provided. PR #701 by @RepublicOfKorokke has already addressed this with a hot-cache-only mode, and I'm working on top of that branch as well. I'd like to explore a better cache strategy in the future to reduce SSD pressure, but can't promise a timeline on that yet.

1 reply

napyang Apr 13, 2026

Sorry for being a rookie here, I run qwen3.5-35b-a3b-oQ4 text only (19GB), and according to the previous release it needs 1GB~1.5GB for 128k context length with turning on TurboQuant 4bit. My computer is Mac Studio M2 32GB and already increase available vram for GPU to 26GB, but when I run over 90k tokens prompt, the log shows “prefill interrupted at xxxxx/xxxxx tokens: 1 request(s) aborted”. This might be the same situation of Peak memory issue? I have monitored the GPU’s memory, it goes up to 25.xxGB then aborted by the system. I don’t know is that normal or just the 26GB vram can’t handle the oQ4(19GB) model, I have to downsize it to oQ3.5?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sorry for the silence, catching up this week #731

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Sorry for the silence, catching up this week #731

Uh oh!

jundot Apr 12, 2026 Maintainer

Replies: 2 comments · 2 replies

Uh oh!

jundot Apr 12, 2026 Maintainer Author

Uh oh!

Uh oh!

blightbow Apr 12, 2026

Uh oh!

Uh oh!

Landon-Molt Apr 12, 2026

Uh oh!

napyang Apr 13, 2026

jundot
Apr 12, 2026
Maintainer

Replies: 2 comments 2 replies

jundot
Apr 12, 2026
Maintainer Author

Landon-Molt
Apr 12, 2026