LlamaBarn is a tiny (12 MB) menu bar app that makes local LLMs as easy as connecting to Wi-Fi.
Install with brew install --cask llamabarn or download from Releases ↗
Running local LLMs from the command line can be error-prone and time-consuming. You have to navigate model formats (GGUF, MLX, etc.), quantization, context windows, model- and device-specific configurations, and avoid system freezes.
Other tools offer some automation but often introduce their own issues — bloated interfaces, proprietary abstractions, or cloud dependencies that complicate local-first workflows.
LlamaBarn stands out as a clean, platform-focused solution:
- Native macOS App — Built with Swift for optimal performance and minimal resource usage.
- GUI for llama.cpp — A simple menu bar interface that handles all the technical heavy-lifting of llama.cpp without the terminal hassle.
- Platform, Not a Product — Like Wi-Fi for your Mac, it lets you use local models in any app (chat UIs, editors, scripts) via a standard API — no vendor lock-in.
- GGML-Backed Purity — Built as part of the GGML org alongside llama.cpp, with direct, unbloated integration for optimal performance and reliability.
- Optimized Model Library — Pre-selected GGUF models tailored to your Mac, auto-configured, and freeze-proof.
LlamaBarn runs as a tiny menu bar app on your Mac.
- Install a model from the built-in catalog -- only models that can run on your Mac are shown
- Select an installed model to run it -- configures and starts a server at
http://localhost:2276 - Use the running model via the API or web UI -- both at
http://localhost:2276
No complex setup — just install, run, and connect.
Connect to any app that supports custom APIs:
- chat UIs like
ChatboxorOpen WebUI - CLI assistants like
OpenCodeorCodex - editors like
VS CodeorZed - editor extensions like
ClineorContinue - custom scripts using
curlor libs likeAI SDK
Or use the built-in web UI at http://localhost:2276 to chat with the running model directly.
LlamaBarn builds on the llama.cpp server and supports the same API endpoints:
# say "Hello" to the running model
curl http://localhost:2276/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'See complete reference in llama-server docs ↗
