Skip to content

feat(http): REST search endpoints + --warm flag#469

Open
jmilinovich wants to merge 1 commit intotobi:mainfrom
jmilinovich:feat/rest-search-api
Open

feat(http): REST search endpoints + --warm flag#469
jmilinovich wants to merge 1 commit intotobi:mainfrom
jmilinovich:feat/rest-search-api

Conversation

@jmilinovich
Copy link
Copy Markdown

Summary

The MCP HTTP server (qmd mcp --http) already has POST /query and /search endpoints, but they require the full structured search format and always use the hybrid pipeline. This PR adds:

  • POST /search/bm25 — direct BM25 keyword search, no LLM needed (~3ms)
  • POST /search/vector — direct vector similarity search (~24ms warm)
  • POST /search/hybrid — BM25 + vector RRF fusion without LLM reranking (~26ms)
  • rerank passthrough on existing POST /query — callers can now send "rerank": false
  • --warm flagqmd mcp --http --warm pre-loads the embedding model on startup. Opt-in to avoid ~300MB memory cost on constrained machines.

Motivation

I built a custom HTTP wrapper around QMD's SDK for my personal vault search and found that keeping the embedding model warm and having direct access to individual backends (without MCP protocol overhead) made a massive difference for agent integrations. Tobi suggested I contribute this upstream.

The individual endpoints are useful for:

  • Debugging which signal (BM25 vs vector) is helping for a given query
  • Agent tools that need fast, targeted search without the full hybrid pipeline
  • Latency-sensitive integrations where reranking isn't needed

API

All new endpoints accept JSON POST:

# BM25 keyword search
curl -X POST localhost:8181/search/bm25 \
  -d '{"query": "authentication", "limit": 10}'

# Vector similarity search
curl -X POST localhost:8181/search/vector \
  -d '{"query": "how does auth work?", "limit": 10}'

# Hybrid (BM25 + vector, no reranking)
curl -X POST localhost:8181/search/hybrid \
  -d '{"query": "auth flow", "limit": 10, "intent": "user login"}'

# Existing endpoint now supports rerank: false
curl -X POST localhost:8181/query \
  -d '{"searches": [{"type": "lex", "query": "auth"}], "rerank": false}'

Response format: { results: [...], mode: "bm25"|"vector"|"hybrid", latency_ms: number }

Test plan

  • npx vitest run test/mcp.test.ts — new tests for all REST endpoints (bm25, vector, hybrid, error cases, rerank passthrough)
  • Manual: qmd mcp --http --warm starts server with embedding model pre-loaded
  • Manual: qmd mcp --http --daemon --warm passes --warm through to daemon child process
  • Verify existing POST /query and POST /mcp still work unchanged

🤖 Generated with Claude Code

The MCP HTTP server already has undocumented POST /query and /search
endpoints but lacks direct access to individual search backends. This
adds POST /search/bm25, /search/vector, and /search/hybrid for callers
who want fast, targeted searches without MCP protocol overhead.

Also wires the `rerank` parameter through the existing POST /query
endpoint so callers can pass `"rerank": false` to skip LLM reranking.

The --warm flag pre-loads the embedding model on server startup so the
first vector search is ~24ms instead of ~700ms cold. Opt-in to avoid
surprising memory usage (~300MB for the embedding model).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant