Support updating `llama-server` separately from app releases #28

fseesink · 2025-11-23T21:43:19Z

fseesink
Nov 23, 2025

As I type this, it appears that LLamaBarn v0.11.0, released on 13 Nov., comes prepackaged with llama.cpp's llama-server version b6942, which was released back on 4 Nov 2025. The latest version of llama-server as I type this is version b7134, released just 10 hours ago on 23 Nov. And there have been nearly 150 releases from then 'til now.

Is there any plan to possibly have LLamaBarn be able to update the version of llama.cpp / llama-server without necessarily having to update the entire macOS application? That is, as written, LLamaBarn will be perennially running an outdated version of llama-server. And to update the version of llama-server, it likely will require a new build of LLamaBarn.

Considering the speed at which llama.cpp versions come out (again, nearly 150 releases occurred since b6942 up to b7134), one of following two scenarios plays out with the current setup. Either

LLamaBarn needs to push out updates every time llama.cpp has an update (which seems both onerous and unnecessary, as the actual LLamaBarn app (i.e., Swift code) likely hasn't changed at all; or
LLamaBarn is perennially outdated

Instead, much as you are doing with the curated models, would it be possible to have LLamaBarn download/self-update the version of llama.cpp / llama-server? Possibly have LLamaBarn not come with a version by default, so the app itself never changes. Then have LLamaBarn pull down the latest current version of llama.cpp and simply use that.

For example, it could simply have a button allowing the user to update to the latest current, where it then goes and pulls the latest .zip for the respective system's architecture (arm64 or x64) and decompresses it to some given directory (e.g., /opt/llamabarn/, where you then have all the files in /opt/llamabarn/build/bin/). Or it could even check what the current latest version is on GitHub--possibly on LLamaBarn startup--and indicate an update is available when that version differs from the currently installed one.

Point being the menu bar app itself likely does not really change that often. But the llama-server is constantly being updated, much as the models used update more often. (Of course, you can get fancier and add even more, like allowing the user to specify specifically which version of llama-server to install, in case they want to regress to test something.)

Anyway, it feels like an opportunity to both keep the LlamaBarn app itself simpler while also providing more functionality.

[NOTE: This wouldn't be so much of an issue if the version of llama.cpp didn't change so often. For example, there is a companion FLOSS macOS menu bar app to Syncthing, the CLI file sync program, called Syncthing for macOS. This app also bundles the version of Syncthing inside itself in a similar manner.

The difference is that they DO update the Syncthing for macOS app whenever the Syncthing tool itself is updated. But thankfully Syncthing does not update nearly as often.

(Currently that project is in a transitional phase as Syncthing came out with v2. But until that point they were pretty lock-step in keeping the underlying version of Syncthing up to date.)

Just figure I would mention this as I tinkered around. Otherwise, this app seems quite nice.]

erusev · 2025-11-24T16:46:56Z

erusev
Nov 24, 2025
Maintainer

Thanks for the detailed writeup.

This would add considerable complexity to the codebase -- we'd need to handle downloading, verifying, extracting, and managing multiple llama-server versions, plus error handling for network issues, incompatible versions, and partial downloads. This infrastructure would become an ongoing maintenance burden.

Before we go down that path, can you help us understand what problem you're trying to solve? Specifically:

What issues are you experiencing with the bundled llama-server version?
Are there specific features or bug fixes in newer llama.cpp releases that you need?
How often would you realistically update llama-server, and why?

We want to understand if this solves a critical problem or if it's more of a nice-to-have. As an early stage product, we're focused on high-priority work that matters now. If there's a real pain point here, we want to understand it. If it's more about keeping things current for the sake of it, that's a lower priority for us at this stage.

0 replies

fseesink · 2025-12-02T17:14:27Z

fseesink
Dec 2, 2025
Author

Oh to be clear, this definitely falls under the "nice to have" category. Basically more feature request than bug/issue. :-)

I only bring this up as I can see you hitting a point where you don't really need to change this Swift app, but you will still have to continue updating it and releasing new versions from time to time simply in order to update the version of llama-server that you provide. If the app itself could self-update the bits it depends upon, you wouldn't need to do so as much.

I think of this kind of like how Rancher Desktop (RD) works with Kubernetes (K3s specially) as one example. You download it once and install it. then you simply select which version of K3s you want to install basically. As the versions of K3s change/update, the drop-down list a user gets changes, allowing them to self-update the underlying bits that RD depends on. But they never have to update the Rancher Desktop macOS app itself.

Similarly, when using things like Ollama or LM Studio (to keep things in the LLM space), you can update models without having to download a new version of either app.

But yes, I fully acknowledge that this adds complexity to the solution vs. what you are currently doing. I mean you'd have to track the llama.cpp GitHub releases page I guess, and as for downloading, it's a single .tar.gz file per version for a given architecture.

I suspect that folks who use llama.cpp directly likely are more experimental in nature. That is, often folks playing with offline LLMs who want more control (or concurrency) than what they can get with apps like LM Studio and Ollama likely download some version of llama.cpp, decompress it, and simply run it in place. They download/decompress various versions as new features are introduced or bugs are found. And just thought it would be cool to have an app that handled this bit (download/decompress a given version into place) vs. doing it all manually.

Mind you, this definitely is more in the "if you want to add any more features" category. So to be clear, I am not experiencing issues currently with the bundled version. But if I did, I would be back to downloading llama.cpp directly to see if it's a version issue, etc.

0 replies

arsaboo · 2025-12-05T01:44:08Z

arsaboo
Dec 5, 2025

BTW. Msty.ai follows the same approach. You can independently update llama.cpp there.

0 replies

fseesink · 2025-12-06T17:36:17Z

fseesink
Dec 6, 2025
Author

Yes, but Msty.ai is closed-source, and it's not clear whether they suffer the same limitations around using llama.cpp that tools like LM Studio suffer; namely, that they queue up requests sequentially to feed the LLM and do not leverage the concurrency capability that llama.cpp's llama-server offers directly now.

My particular use case is that I am looking to do only offline LLM work including currently using VS Code with the Continue.dev extension to tie into an LLM like Qwen3-Coder for testing coding work. Until very recently I was using LM Studio for this, as it is a very polished GUI that offers simple downloads/management of models, tells you useful things like whether a given model will fit within your vRAM, etc. (I am mostly on Macs and otherwise Linux, so having things like MLX support is also nice.)

BUT LM Studio also funnels requests sequentially from its OpenAI-compatible server to llama.cpp. This is fine for single use things like basic chat with the LLM, but when doing code work, you can have multiple requests going to the LLM for autocomplete, etc., and this sequential access can slow things down.

So at this point I am simply manually downloading the llama.cpp binaries, along with a specific LLM model from Huggingface, and running a basic script to stand up the LLM so that Continue.dev hooks into that. It's fugly, and honestly not sure yet how much more efficient/effective it is concurrency-wise I mean. I still need to benchmark things. But my goal is to see whether doing so provides a more responsive environment.

But the ideal solution is a simple GUI that can do these bits for me (download the latest llama.cpp binaries along with whatever model(s) I want to use, and run a model via llama-server so that I get the concurrency as well). And LlamaBarn comes pretty darn close.

I also much prefer to use and support folks doing open-source work. Anyway, as I earlier said, this was more a feature request / "nice to have" thing. But totally get the simplicity angle. So will keep an eye on this tool over time, should they opt to add in more features. (I already added it to my Munki repo, so I'll know when new versions drop.)

0 replies

erusev · 2025-12-20T12:33:09Z

erusev
Dec 20, 2025
Maintainer

Thanks for sharing more context! As an early-stage product, we're currently focused on essential features with clear benefits and minimal complexity. Since this feels more like a nice-to-have enhancement, it's not a priority right now. We'll revisit if it becomes more critical down the line.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support updating `llama-server` separately from app releases #28

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support updating llama-server separately from app releases #28

Uh oh!

fseesink Nov 23, 2025

Replies: 5 comments

Uh oh!

erusev Nov 24, 2025 Maintainer

Uh oh!

fseesink Dec 2, 2025 Author

Uh oh!

arsaboo Dec 5, 2025

Uh oh!

fseesink Dec 6, 2025 Author

Uh oh!

erusev Dec 20, 2025 Maintainer

Support updating `llama-server` separately from app releases #28

fseesink
Nov 23, 2025

erusev
Nov 24, 2025
Maintainer

fseesink
Dec 2, 2025
Author

arsaboo
Dec 5, 2025

fseesink
Dec 6, 2025
Author

erusev
Dec 20, 2025
Maintainer