Support running embedding models alongside generation models #27

jmcallister83 · 2025-12-11T19:23:42Z

jmcallister83
Dec 11, 2025

Description: Currently, LlamaBarn appears to limit execution to a single model at a time via the llama.cpp API. This creates a bottleneck when using AI coding assistants (like Roo Code) that require two distinct models: one for chat/generation and a separate model for generating embeddings to index the codebase.

Proposed Solution: Allow LlamaBarn to load and serve two models simultaneously, ideally on separate ports or endpoints.

Model A (Generation): Handles standard prompts/chat.

Model B (Embeddings): Handles vector indexing for RAG workflows.

Use Case: When using Roo Code, the user needs to chat with a robust LLM (e.g., Qwen3 Coder) while the tool simultaneously indexes the workspace using a lightweight embedding model (e.g., Nomic-Embed). Currently, this requires stopping/starting models or running a separate instance, disrupting the workflow.

erusev · 2025-12-20T12:12:54Z

erusev
Dec 20, 2025
Maintainer

Thanks for the feature request! I understand the need for running multiple models simultaneously, especially for coding assistants.

Currently, LlamaBarn only supports one model at a time and doesn't include embedding models.

I'd love to add concurrent model support eventually, but I'm hesitant about prioritizing this now. The main concerns are:

Changing landscape: From what I've seen, agents are actually moving away from traditional embedding approaches. This makes me question whether we should invest heavily in this when the ecosystem seems to be heading in a different direction.
Big scope: This request really involves two major features — adding embedding models to our catalog and enabling concurrent execution. Even just the embedding part would be tricky to integrate without adding clutter to the UI.

This is definitely something we'll consider as LlamaBarn grows. We'll keep an eye on how agents evolve and gather more user feedback.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support running embedding models alongside generation models #27

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Support running embedding models alongside generation models #27

Uh oh!

jmcallister83 Dec 11, 2025

Replies: 1 comment

Uh oh!

Uh oh!

erusev Dec 20, 2025 Maintainer

jmcallister83
Dec 11, 2025

erusev
Dec 20, 2025
Maintainer