feat(http): REST search endpoints + --warm flag#469
Open
jmilinovich wants to merge 1 commit intotobi:mainfrom
Open
feat(http): REST search endpoints + --warm flag#469jmilinovich wants to merge 1 commit intotobi:mainfrom
jmilinovich wants to merge 1 commit intotobi:mainfrom
Conversation
The MCP HTTP server already has undocumented POST /query and /search endpoints but lacks direct access to individual search backends. This adds POST /search/bm25, /search/vector, and /search/hybrid for callers who want fast, targeted searches without MCP protocol overhead. Also wires the `rerank` parameter through the existing POST /query endpoint so callers can pass `"rerank": false` to skip LLM reranking. The --warm flag pre-loads the embedding model on server startup so the first vector search is ~24ms instead of ~700ms cold. Opt-in to avoid surprising memory usage (~300MB for the embedding model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The MCP HTTP server (
qmd mcp --http) already hasPOST /queryand/searchendpoints, but they require the full structured search format and always use the hybrid pipeline. This PR adds:POST /search/bm25— direct BM25 keyword search, no LLM needed (~3ms)POST /search/vector— direct vector similarity search (~24ms warm)POST /search/hybrid— BM25 + vector RRF fusion without LLM reranking (~26ms)rerankpassthrough on existingPOST /query— callers can now send"rerank": false--warmflag —qmd mcp --http --warmpre-loads the embedding model on startup. Opt-in to avoid ~300MB memory cost on constrained machines.Motivation
I built a custom HTTP wrapper around QMD's SDK for my personal vault search and found that keeping the embedding model warm and having direct access to individual backends (without MCP protocol overhead) made a massive difference for agent integrations. Tobi suggested I contribute this upstream.
The individual endpoints are useful for:
API
All new endpoints accept JSON POST:
Response format:
{ results: [...], mode: "bm25"|"vector"|"hybrid", latency_ms: number }Test plan
npx vitest run test/mcp.test.ts— new tests for all REST endpoints (bm25, vector, hybrid, error cases, rerank passthrough)qmd mcp --http --warmstarts server with embedding model pre-loadedqmd mcp --http --daemon --warmpasses --warm through to daemon child processPOST /queryandPOST /mcpstill work unchanged🤖 Generated with Claude Code