feat(Add OpenAI-compatible backend support) by loopyd · Pull Request #619 · tobi/qmd

loopyd · 2026-05-02T02:43:47Z

Problem

QMD currently assumes a local llama.cpp style setup for generation, embeddings, and reranking which is baked in. This prevents any and all user customization to llama cpp (such as for example, using TheTom's turboquant fork...) and doesn't integrate well with existing homelab servers as it tries to run it all locally on the machine.

That makes it harder to use QMD with a local OpenAI-compatible server setup, such as llama-swap which is what I tested this PR with. Even when the server already exposes the same models through /v1/chat/completions, /v1/embeddings, and /v1/rerank, why can't we? Now we can!

Solution

This PR adds an OpenAI-compatible backend alongside the existing llama cpp one that lets QMD talk to a local compatible server instead of requiring direct local model access. Meaning now you can run this on your laptop while your homelab does the tensor crunching!

It also makes the CLI and store paths respect configured model aliases, so users can route QMD generation, embedding, and reranking through named server-side models (ex: qmd-generate, qmd-embed, and qmd-rerank) which I set my server up with for testing this PR.

What's Changed?

Added config support for:
- llm.provider
- llm.baseUrl
- llm.apiKey
Updated QMD to select the backend when configured.
Routed query expansion, embedding, and reranking through the configured remote model aliases.
Updated vector search and rerank paths to use the configured embed and rerank model names instead of hardcoded defaults.
Updated qmd embed so it uses the configured embedding alias instead of forcing the built-in default model name.
Updated help and status output to show the active configured models and backend more clearly.
Added rerank handling that recovers from oversized rerank requests by splitting batches and truncating oversized single documents when needed.
Added and updated tests for the new backend behavior.

Testing

Automated

Run the focused LLM test for the new rerank recovery path:

npx vitest run test/llm.test.ts -t "recovers from oversized rerank requests by splitting and truncating"

Run the existing adjacent OpenAI-compatible rerank mapping test:

npx vitest run test/llm.test.ts -t "rerank maps remote indices back to source files"

Manual

You can test this with llama-swap or any server that exposes OpenAI-compatible chat, embedding, and rerank endpoints.

Option A: Use llama-swap

Start or prepare a local OpenAI-compatible server.
Expose three model IDs, for example:
- qmd-generate
- qmd-embed
- qmd-rerank
Make sure the server exposes these routes:
- POST /v1/chat/completions
- POST /v1/embeddings
- POST /v1/rerank

Option B: Roll your own compatible server

Use any local server that follows the same OpenAI-compatible route layout.
Configure one model for generation, one for embeddings, and one for reranking.
Point QMD at that server with the config below.

QMD config example

Create or update your QMD config:

models:
  generate: qmd-generate
  embed: qmd-embed
  rerank: qmd-rerank

llm:
  provider: openai-compatible
  baseUrl: http://127.0.0.1:8080/v1
  apiKey: your-local-api-key

Verify the flow

Check help and status:

qmd --index my-index --help
qmd --index my-index status

Build embeddings:

qmd --index my-index embed

Run a query:

qmd --index my-index query "How do I unpack EMI archives?" -n 3 --json

Confirm the server receives requests for:
- chat completions
- embeddings
- rerank
If your reranker has tighter request limits, verify the query still succeeds and that rerank requests continue after the first oversized split when needed.

feat: support openai-compatible llama-swap backends

d6c66e9

loopyd changed the title ~~Add OpenAI-compatible llama-swap backend support~~ Add OpenAI-compatible backend support May 2, 2026

loopyd mentioned this pull request May 2, 2026

Support OpenAI-compatible backends for generation, embeddings, and reranking #620

Open

loopyd changed the title ~~Add OpenAI-compatible backend support~~ feat(Add OpenAI-compatible backend support) May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(Add OpenAI-compatible backend support)#619

feat(Add OpenAI-compatible backend support)#619
loopyd wants to merge 1 commit intotobi:mainfrom
loopyd:feat/openai-compatible-llamaswap

loopyd commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loopyd commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

What's Changed?

Testing

Automated

Manual

Option A: Use llama-swap

Option B: Roll your own compatible server

QMD config example

Verify the flow

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

loopyd commented May 2, 2026 •

edited

Loading