Skip to content

Switch MCP HTTP server to stateless mode#624

Open
DavidDwyer87 wants to merge 46 commits intotobi:mainfrom
cicadialabs:session-bug
Open

Switch MCP HTTP server to stateless mode#624
DavidDwyer87 wants to merge 46 commits intotobi:mainfrom
cicadialabs:session-bug

Conversation

@DavidDwyer87
Copy link
Copy Markdown

Summary

  • Convert the MCP HTTP server from stateful session management to stateless mode (sessionIdGenerator: undefined)
  • Each POST /mcp request now creates a fresh McpServer + WebStandardStreamableHTTPServerTransport, handles the request, then cleans up
  • GET/DELETE on /mcp now return 405 (not applicable for stateless)
  • Removed sessions Map, createSession() helper, and related session lifecycle code

Why

When the MCP HTTP server restarts (crash, deploy, OOM), all in-memory sessions are lost. Clients holding old session IDs receive persistent "Session not found" errors and cannot recover without restarting their own process. This is especially painful for remote MCP setups (e.g., type: "remote" in opencode config) where the client and server are on different machines.

Changes

  • Removed: sessions Map, createSession(), session routing logic, stale session handling
  • Removed: Unused imports (randomUUID, isInitializeRequest)
  • Added: Fresh McpServer + stateless transport per POST request, with cleanup after response
  • Added: 405 response for GET/DELETE on /mcp endpoint
  • Net: -76 lines

Testing

All verified locally with node dist/cli/qmd.js mcp --http --port 8181 --daemon:

  • Tool call without prior initialize → works (no session required)
  • Sequential tool calls → each gets fresh transport, all work
  • GET /mcp → 405 Method Not Allowed
  • Initialize request → returns proper capabilities and instructions
  • Lex search query → returns ranked results
  • Collections tool → lists all collections
  • REST endpoints (/health, /query, /search) → unchanged

David Dwyer and others added 30 commits April 13, 2026 05:48
Implements the LLM interface using the Ollama REST API as an alternative
to the default node-llama-cpp local GGUF inference.

New files:
- src/ollama.ts: OllamaLLM class with embed(), generate(), expandQuery(),
  rerank(), modelExists(), dispose() methods using Ollama REST API
- test/ollama.test.ts: 45 unit tests covering all methods, error handling,
  configuration, and getDefaultLLM() routing

Modified files:
- src/llm.ts: Added getDefaultLLM() function that routes to OllamaLLM
  when QMD_LLM_BACKEND=ollama, otherwise falls back to LlamaCpp

Configuration (env vars):
- QMD_LLM_BACKEND=ollama — enable Ollama backend
- QMD_OLLAMA_BASE_URL — server URL (default: http://localhost:11434)
- QMD_OLLAMA_EMBED_MODEL — embedding model (default: nomic-embed-text)
- QMD_OLLAMA_GENERATE_MODEL — generation model (default: qwen3:1.7b)
- QMD_OLLAMA_RERANK_MODEL — reranking model (default: qwen3:0.6b)

Reranking uses chat-based relevance scoring since Ollama has no native
rerank API. The model outputs relevance scores which are parsed and
normalized to [0, 1].

Also verified MCP server works via both stdio and HTTP transports with
all 4 tools (query, get, multi_get, status) accessible.
Adds 9 new tools to the QMD MCP server for full programmatic control:

Collection Management:
- collections: list all collections with stats
- add_collection: add a new collection (name, path, pattern, ignore)
- remove_collection: remove a collection by name
- rename_collection: rename a collection

Context Management:
- contexts: list all contexts + global context
- add_context: add context to a collection path
- remove_context: remove context from a collection path

Indexing:
- update_index: re-index collections from filesystem
- embed: generate vector embeddings for documents

Previously the MCP server only exposed 4 read-only tools (query, get,
multi_get, status). Now agents can manage collections, set context, and
trigger indexing operations remotely via MCP.

Test coverage: 20 unit tests covering all new tools.

Combined with the existing 4 tools, QMD now exposes 13 MCP tools total.
feat: add Ollama LLM provider for remote model inference
feat: add MCP management tools for collections, contexts, and indexing
…llama

When QMD_LLM_BACKEND=ollama is set, QMD no longer downloads or loads
local GGUF models. The store layer now uses the generic LLM interface
throughout, routing to OllamaLLM when configured.

Changes to src/store.ts:
- getLlm() returns LLM interface instead of LlamaCpp concrete type
- Store.llm field typed as LLM instead of LlamaCpp
- All llmOverride params typed as LLM instead of LlamaCpp
- All getDefaultLlamaCpp() calls replaced with getDefaultLLM()
- chunkDocumentByTokens() uses instanceof guard for backend-specific logic

Changes to src/llm.ts:
- Added intent to ExpandQueryOptions in LLM interface
- Added embedBatch() and embedModelName to LLM interface
- Added SimpleLLMSession for non-LlamaCpp backends
- Updated withLLMSessionForLlm() to accept LLM interface

Changes to src/ollama.ts:
- Added embedModelName getter and embedBatch() method
- Renamed private fields to avoid interface naming conflicts

Changes to test/ollama.test.ts:
- Fixed field name references after private field rename
…tion

refactor: use LLM interface in store.ts to skip GGUF downloads with Ollama
- /health now returns total indexed documents and docs needing embedding
- Update test to verify new fields
- Add Jenkinsfile for CI/CD pipeline
@socket-security
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant