Skip to content

Unload models on idle through environment variable#1389

Open
jaeiclee wants to merge 1 commit intolemonade-sdk:mainfrom
jaeiclee:auto-unload
Open

Unload models on idle through environment variable#1389
jaeiclee wants to merge 1 commit intolemonade-sdk:mainfrom
jaeiclee:auto-unload

Conversation

@jaeiclee
Copy link
Copy Markdown
Contributor

@jaeiclee jaeiclee commented Mar 17, 2026

Disclaimer: The implementation of this feature was done by Qwen 3.5 (running on Lemonade, of course) through Pi Coding Agent.

--

The auto-unload feature automatically unloads idle models after a preset period of inactivity, which can be configured through the "LEMONADE_GLOBAL_AUTO_UNLOAD_TIMER" environment variable.

The timer will run separately for each model once loaded, reset whenever a model becomes no longer busy (=idle), and should be cleared when the model is unloaded manually.

@jaeiclee
Copy link
Copy Markdown
Contributor Author

jaeiclee commented Mar 17, 2026

Attempted as a response to #1365, but I'm not sure if I've done this correctly given that I have zero experience with C++.

Specifically this: The timer doesn't get explicitly destroyed when the model is unloaded through other means - So the max-model limit and the NPU exclusivity shouldn't cancel the timer afaik. It shouldn't be a problem since the timer will die gracefully anyway, but it's still not ideal, and might become a problem if the timer is long enough to have a lot of dead timers as models are switched automatically.

-> Should I make sure to cancel the timer for all cases where the model is unloaded by any means? and did I miss anything other than evict_all_npu_servers() and evict_server(lru)?

PS. I wanted to implement the per-model setting option that will be adjusted by the model's recipe file, but this would require frontend changes for the user so it's been put out-of-scope. Would be nice if we could do it as the next step after this PR.

@jaeiclee
Copy link
Copy Markdown
Contributor Author

Attempts to resolve the hanging timer makes the "Resource deadlock avoided" error return. I think I'll cook this one up a bit more. Sorry for the trouble.

@jaeiclee jaeiclee closed this Mar 17, 2026
@jaeiclee jaeiclee reopened this Mar 17, 2026
@jaeiclee jaeiclee marked this pull request as draft March 17, 2026 07:56
@jeremyfowers
Copy link
Copy Markdown
Member

Attempts to resolve the hanging timer makes the "Resource deadlock avoided" error return. I think I'll cook this one up a bit more. Sorry for the trouble.

no worries! this would be a great feature to have. I'm also really excited that you're using Lemonade to code Lemonade!

@kenvandine
Copy link
Copy Markdown
Member

This is something I had worked on last week, and I have a branch but not submitted as a PR yet. Maybe if you want to finish my branch? kenvandine@4ce4f60

Implements automatic unloading of models after a configurable idle period via "LEMONADE_GLOBAL_AUTO_UNLOAD_TIMER" env variable.

- Uses condition-variable-based timers for non-blocking cancellation, preventing deadlocks between timer threads and the model loading mutex.
- Timers are cancelled on all eviction paths (LRU, NPU exclusivity, nuclear option) to prevent orphaned timers.
- Added the new env variable in the documentation.
@jaeiclee
Copy link
Copy Markdown
Contributor Author

jaeiclee commented Mar 17, 2026

@kenvandine This method would have saved me a lot of headache coming from managing multiple timer threads🫠 However, I did want to eventually allow us to override the global timeout variable on a per-model basis through the recipe file (and frontend UI by extension) - that's why I had to make this work with juggling multiple timers.

I think my latest commit works well enough without crashing and behaves similarly to other unloading methods - I just need to test the NPU exclusivity case, then it should be good to go (at least in my eyes).

@jaeiclee
Copy link
Copy Markdown
Contributor Author

jaeiclee commented Mar 17, 2026

Alright. Works flawlessly for the NPU exclusivity case too. I think the implementation itself is as good as it gets...at least to my eyes :)

I do wonder how (or whether) we should add this feature to the CLI parameter:

  • Managing this together with "--global-timeout" is probably not a good fit since this option is not mandatory like other timeouts are.
  • I was slightly cautious of adding a new CLI parameter because I see it as a UI of a sort - I wanted to get the implementation done before thinking about where else this setting can go to.
  • And if we are adding this as an option to CLI, it'd be nice to have the same feature on GUI too.

Given that I started this from #1365, I think we can discuss further there if this gets merged (or is about to).

@jaeiclee jaeiclee marked this pull request as ready for review March 17, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants