Add remote embedding, reranking, and query expansion support by georgelichen · Pull Request #629 · tobi/qmd

georgelichen · 2026-05-05T14:38:03Z

This PR ports the remote embedding / reranking work from PR #517 onto the current upstream main, and includes the follow-up fixes needed to make it usable in the current tree.

What is included

add OpenAI-compatible remote embedding support
add OpenAI-compatible remote reranking support
add remote query expansion support via chat completions
fix drift against current main (build/model label integration fixes)
avoid initializing local node-llama-cpp during qmd embed when remote embedding is configured

Why this PR exists

PR #517 was based on an older branch state. This branch rebases the feature set onto the current upstream main and resolves the integration drift, so the remote LLM path can be reviewed against today's tree.

Verification

npx tsc -p tsconfig.build.json
npx vitest run --reporter=verbose test/remote-llm.test.ts test/remote-llm-integration.test.ts
npx vitest run test/store.test.ts -t "generateEmbeddings" --reporter=verbose
npx vitest run test/store.test.ts -t "Token chunking guardrails" --reporter=verbose

Notes

Query expansion uses expand_api_* when configured; otherwise normal qmd query "..." still falls back to local expansion.
Structured queries (intent:/lex:/vec:/hyde:) skip auto expansion entirely.

Support offloading embedding and reranking to remote OpenAI-compatible servers (vLLM, Ollama, LM Studio, OpenAI) while preserving local query expansion and tokenization via a hybrid routing layer. - RemoteLLM: HTTP client with circuit breaker, dimension validation, batch splitting, auth headers, configurable timeouts - HybridLLM: routes embed/rerank → remote, generate/expand → local - LLM interface: add embedBatch, embedModelName; generalize singleton and session management from LlamaCpp to LLM - Config: QMD_EMBED_API_URL/MODEL env vars or YAML models section - Skip nomic/Qwen3 text formatting prefixes for remote models - 36 unit tests + 30 integration tests against live vLLM Related: tobi#489, tobi#427, tobi#446, tobi#511 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add intent? to LLM interface and ILLMSession expandQuery signature (store.ts passes { intent } but interface didn't declare it — tsc error) - Derive embed model label from getDefaultLLM().embedModelName after getStore() so content_vectors.model reflects the actual LLM in use (previously always stored DEFAULT_EMBED_MODEL_URI even with remote) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- RemoteLLM.expandQuery() calls /chat/completions when expandApiModel is configured; throws "expandApiModel not configured" otherwise - Independent circuit breaker for the expand endpoint - parseExpandResponse() parses lex/vec/hyde lines, filters terms that don't share a word with the original query, falls back gracefully on bad model output - RemoteLLM.supportsExpand getter for routing decisions - HybridLLM routes expandQuery to remote when remote.supportsExpand, otherwise falls back to local LlamaCpp (no interface changes) - remoteConfigFromEnv() handles QMD_EXPAND_API_URL / QMD_EXPAND_API_MODEL / QMD_EXPAND_API_KEY and YAML expand_api_* fields - Unit tests (mock HTTP server, VCR-style): payload shape, auth header fallback, lex/vec/hyde parsing, includeLexical=false filtering, fallback on bad output, query-term filtering, circuit breaker, HybridLLM routing (remote vs local), config env vars - Integration tests: live server connectivity, all three types returned, includeLexical=false, intent incorporation, HybridLLM routing verified via LOCAL_SENTINEL sentinel (new VLLM_EXPAND_URL / VLLM_EXPAND_MODEL env vars, skipped when absent)

Merge PR tobi#517 and keep it compatible with the current main branch. Constraint: Upstream main diverged after PR tobi#517, so a fast-forward merge was not possible Rejected: Cherry-pick the PR commits directly | would still require the same compatibility fixes and lose merge context Confidence: medium Scope-risk: moderate Directive: Keep RemoteLLM and HybridLLM aligned with the LLM tokenize/detokenize interface and verify Windows CLI wrappers separately from Unix shell scripts Tested: npx tsc -p tsconfig.build.json; npx vitest run --reporter=verbose test/remote-llm.test.ts test/remote-llm-integration.test.ts Not-tested: full vitest suite; npm run build wrapper script on Windows; live GitHub Actions

When the active embedding backend is remote, generateEmbeddings now uses character-space chunking instead of token-based preprocessing. This keeps qmd embed from initializing node-llama-cpp solely to tokenize input before calling a remote embedding API. The change is scoped to indexing. Query-time expansion and reranking keep their existing routing rules, and a regression test now fails if remote embedding falls back to local tokenization during indexing. Constraint: Remote embedding backends do not expose a tokenizer interface in QMD today Rejected: Change HybridLLM tokenize() globally | would alter query-time behavior and broaden risk unnecessarily Confidence: high Scope-risk: narrow Reversibility: clean Directive: If remote token-aware chunking is added later, keep qmd embed free of mandatory local llama initialization Tested: npx tsc -p tsconfig.build.json Tested: npx vitest run test/store.test.ts -t "generateEmbeddings" --reporter=verbose Tested: npx vitest run test/store.test.ts -t "Token chunking guardrails" --reporter=verbose Not-tested: Full end-to-end qmd embed against a live remote embedding service after this code change

Jim Smith and others added 5 commits April 12, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add remote embedding, reranking, and query expansion support#629

Add remote embedding, reranking, and query expansion support#629
georgelichen wants to merge 5 commits intotobi:mainfrom
georgelichen:merge-pr-517-remote-llm

georgelichen commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

georgelichen commented May 5, 2026

What is included

Why this PR exists

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant