Add remote embedding, reranking, and query expansion support#629
Open
georgelichen wants to merge 5 commits intotobi:mainfrom
Open
Add remote embedding, reranking, and query expansion support#629georgelichen wants to merge 5 commits intotobi:mainfrom
georgelichen wants to merge 5 commits intotobi:mainfrom
Conversation
Support offloading embedding and reranking to remote OpenAI-compatible servers (vLLM, Ollama, LM Studio, OpenAI) while preserving local query expansion and tokenization via a hybrid routing layer. - RemoteLLM: HTTP client with circuit breaker, dimension validation, batch splitting, auth headers, configurable timeouts - HybridLLM: routes embed/rerank → remote, generate/expand → local - LLM interface: add embedBatch, embedModelName; generalize singleton and session management from LlamaCpp to LLM - Config: QMD_EMBED_API_URL/MODEL env vars or YAML models section - Skip nomic/Qwen3 text formatting prefixes for remote models - 36 unit tests + 30 integration tests against live vLLM Related: tobi#489, tobi#427, tobi#446, tobi#511 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add intent? to LLM interface and ILLMSession expandQuery signature
(store.ts passes { intent } but interface didn't declare it — tsc error)
- Derive embed model label from getDefaultLLM().embedModelName after
getStore() so content_vectors.model reflects the actual LLM in use
(previously always stored DEFAULT_EMBED_MODEL_URI even with remote)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- RemoteLLM.expandQuery() calls /chat/completions when expandApiModel is configured; throws "expandApiModel not configured" otherwise - Independent circuit breaker for the expand endpoint - parseExpandResponse() parses lex/vec/hyde lines, filters terms that don't share a word with the original query, falls back gracefully on bad model output - RemoteLLM.supportsExpand getter for routing decisions - HybridLLM routes expandQuery to remote when remote.supportsExpand, otherwise falls back to local LlamaCpp (no interface changes) - remoteConfigFromEnv() handles QMD_EXPAND_API_URL / QMD_EXPAND_API_MODEL / QMD_EXPAND_API_KEY and YAML expand_api_* fields - Unit tests (mock HTTP server, VCR-style): payload shape, auth header fallback, lex/vec/hyde parsing, includeLexical=false filtering, fallback on bad output, query-term filtering, circuit breaker, HybridLLM routing (remote vs local), config env vars - Integration tests: live server connectivity, all three types returned, includeLexical=false, intent incorporation, HybridLLM routing verified via LOCAL_SENTINEL sentinel (new VLLM_EXPAND_URL / VLLM_EXPAND_MODEL env vars, skipped when absent)
Merge PR tobi#517 and keep it compatible with the current main branch. Constraint: Upstream main diverged after PR tobi#517, so a fast-forward merge was not possible Rejected: Cherry-pick the PR commits directly | would still require the same compatibility fixes and lose merge context Confidence: medium Scope-risk: moderate Directive: Keep RemoteLLM and HybridLLM aligned with the LLM tokenize/detokenize interface and verify Windows CLI wrappers separately from Unix shell scripts Tested: npx tsc -p tsconfig.build.json; npx vitest run --reporter=verbose test/remote-llm.test.ts test/remote-llm-integration.test.ts Not-tested: full vitest suite; npm run build wrapper script on Windows; live GitHub Actions
When the active embedding backend is remote, generateEmbeddings now uses character-space chunking instead of token-based preprocessing. This keeps qmd embed from initializing node-llama-cpp solely to tokenize input before calling a remote embedding API. The change is scoped to indexing. Query-time expansion and reranking keep their existing routing rules, and a regression test now fails if remote embedding falls back to local tokenization during indexing. Constraint: Remote embedding backends do not expose a tokenizer interface in QMD today Rejected: Change HybridLLM tokenize() globally | would alter query-time behavior and broaden risk unnecessarily Confidence: high Scope-risk: narrow Reversibility: clean Directive: If remote token-aware chunking is added later, keep qmd embed free of mandatory local llama initialization Tested: npx tsc -p tsconfig.build.json Tested: npx vitest run test/store.test.ts -t "generateEmbeddings" --reporter=verbose Tested: npx vitest run test/store.test.ts -t "Token chunking guardrails" --reporter=verbose Not-tested: Full end-to-end qmd embed against a live remote embedding service after this code change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR ports the remote embedding / reranking work from PR #517 onto the current upstream
main, and includes the follow-up fixes needed to make it usable in the current tree.What is included
main(build/model label integration fixes)node-llama-cppduringqmd embedwhen remote embedding is configuredWhy this PR exists
PR #517 was based on an older branch state. This branch rebases the feature set onto the current upstream
mainand resolves the integration drift, so the remote LLM path can be reviewed against today's tree.Verification
npx tsc -p tsconfig.build.jsonnpx vitest run --reporter=verbose test/remote-llm.test.ts test/remote-llm-integration.test.tsnpx vitest run test/store.test.ts -t "generateEmbeddings" --reporter=verbosenpx vitest run test/store.test.ts -t "Token chunking guardrails" --reporter=verboseNotes
expand_api_*when configured; otherwise normalqmd query "..."still falls back to local expansion.intent:/lex:/vec:/hyde:) skip auto expansion entirely.