Unload models on idle through environment variable#1389
Unload models on idle through environment variable#1389jaeiclee wants to merge 1 commit intolemonade-sdk:mainfrom
Conversation
|
Attempted as a response to #1365, but I'm not sure if I've done this correctly given that I have zero experience with C++.
PS. I wanted to implement the per-model setting option that will be adjusted by the model's recipe file, but this would require frontend changes for the user so it's been put out-of-scope. Would be nice if we could do it as the next step after this PR. |
|
Attempts to resolve the hanging timer makes the "Resource deadlock avoided" error return. I think I'll cook this one up a bit more. Sorry for the trouble. |
no worries! this would be a great feature to have. I'm also really excited that you're using Lemonade to code Lemonade! |
|
This is something I had worked on last week, and I have a branch but not submitted as a PR yet. Maybe if you want to finish my branch? kenvandine@4ce4f60 |
Implements automatic unloading of models after a configurable idle period via "LEMONADE_GLOBAL_AUTO_UNLOAD_TIMER" env variable. - Uses condition-variable-based timers for non-blocking cancellation, preventing deadlocks between timer threads and the model loading mutex. - Timers are cancelled on all eviction paths (LRU, NPU exclusivity, nuclear option) to prevent orphaned timers. - Added the new env variable in the documentation.
|
@kenvandine This method would have saved me a lot of headache coming from managing multiple timer threads🫠 However, I did want to eventually allow us to override the global timeout variable on a per-model basis through the recipe file (and frontend UI by extension) - that's why I had to make this work with juggling multiple timers. I think my latest commit works well enough without crashing and behaves similarly to other unloading methods - I just need to test the NPU exclusivity case, then it should be good to go (at least in my eyes). |
|
Alright. Works flawlessly for the NPU exclusivity case too. I think the implementation itself is as good as it gets...at least to my eyes :) I do wonder how (or whether) we should add this feature to the CLI parameter:
Given that I started this from #1365, I think we can discuss further there if this gets merged (or is about to). |
Disclaimer: The implementation of this feature was done by Qwen 3.5 (running on Lemonade, of course) through Pi Coding Agent.
--
The auto-unload feature automatically unloads idle models after a preset period of inactivity, which can be configured through the "LEMONADE_GLOBAL_AUTO_UNLOAD_TIMER" environment variable.
The timer will run separately for each model once loaded, reset whenever a model becomes no longer busy (=idle), and should be cleared when the model is unloaded manually.