Sorry for the silence, catching up this week #731
jundot
announced in
Announcements
Replies: 2 comments 2 replies
-
|
If any of you have something critical or blocking, feel free to leave a comment here and I'll prioritize it. |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
First and foremost: Thank you for opening up this incredible project — you don't have to apologize for your work-life balance choices!
You deserve all the credit, and of course, all the time you need. In this AI era, increased productivity should
alleviate the pressure on the working class and give us more time and focus for our personal lives and families.
Peak memory / TurboQuant: I'm currently working on some performance issues, specifically peak memory consumption
during prefill with TurboQuant enabled. The root cause is upstream in mlx-vlm — prefill_attention() silently falls
back to decompressing the entire KV cache to fp16 after PR #909 dropped ProdCodec. I've filed the finding
upstream at Blaizzy/mlx-vlm#1016 and am working on a fused Metal kernel fix. This could end up being a tradeoff
between memory and prefill speed — I'm trying to make it work without that compromise.
SSD write volume: People have raised concerns about high SSD writes from the caching system. We all love this
feature! However, at the current write rate (17 TB on my fresh MacBook Pro M5 Max in less than a month), I think
at least a toggle should be provided. PR #701 by @RepublicOfKorokke has already addressed this with a
hot-cache-only mode, and I'm working on top of that branch as well. I'd like to explore a better cache strategy in
the future to reduce SSD pressure, but can't promise a timeline on that yet.
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone, I owe you all an apology. I had personal things to take care of this weekend and couldn't find any time at all to look at issues or PRs. I know a lot of you have been waiting and I'm really sorry about that.
I'll start going through everything from tuesday night. It might take a few days to get to all of them, but I'll do my best to respond to each one. Thanks for your patience and sorry again for the delay.
Tagging recent PR contributors so you know I haven't forgotten about your work:
@jroth1111 @beantownbytes @kyr0 @thornad @Landon-Molt @applesauce49 @RepublicOfKorokke @jaredlockhart @0xClandestine @jnchaba @Chedrian07 @Bahtya @yizhang
Beta Was this translation helpful? Give feedback.
All reactions