feat(http): REST search endpoints + --warm flag by jmilinovich · Pull Request #469 · tobi/qmd

jmilinovich · 2026-03-26T22:44:06Z

Summary

The MCP HTTP server (qmd mcp --http) already has POST /query and /search endpoints, but they require the full structured search format and always use the hybrid pipeline. This PR adds:

POST /search/bm25 — direct BM25 keyword search, no LLM needed (~3ms)
POST /search/vector — direct vector similarity search (~24ms warm)
POST /search/hybrid — BM25 + vector RRF fusion without LLM reranking (~26ms)
rerank passthrough on existing POST /query — callers can now send "rerank": false
--warm flag — qmd mcp --http --warm pre-loads the embedding model on startup. Opt-in to avoid ~300MB memory cost on constrained machines.

Motivation

I built a custom HTTP wrapper around QMD's SDK for my personal vault search and found that keeping the embedding model warm and having direct access to individual backends (without MCP protocol overhead) made a massive difference for agent integrations. Tobi suggested I contribute this upstream.

The individual endpoints are useful for:

Debugging which signal (BM25 vs vector) is helping for a given query
Agent tools that need fast, targeted search without the full hybrid pipeline
Latency-sensitive integrations where reranking isn't needed

API

All new endpoints accept JSON POST:

# BM25 keyword search
curl -X POST localhost:8181/search/bm25 \
  -d '{"query": "authentication", "limit": 10}'

# Vector similarity search
curl -X POST localhost:8181/search/vector \
  -d '{"query": "how does auth work?", "limit": 10}'

# Hybrid (BM25 + vector, no reranking)
curl -X POST localhost:8181/search/hybrid \
  -d '{"query": "auth flow", "limit": 10, "intent": "user login"}'

# Existing endpoint now supports rerank: false
curl -X POST localhost:8181/query \
  -d '{"searches": [{"type": "lex", "query": "auth"}], "rerank": false}'

Response format: { results: [...], mode: "bm25"|"vector"|"hybrid", latency_ms: number }

Test plan

npx vitest run test/mcp.test.ts — new tests for all REST endpoints (bm25, vector, hybrid, error cases, rerank passthrough)
Manual: qmd mcp --http --warm starts server with embedding model pre-loaded
Manual: qmd mcp --http --daemon --warm passes --warm through to daemon child process
Verify existing POST /query and POST /mcp still work unchanged

🤖 Generated with Claude Code

The MCP HTTP server already has undocumented POST /query and /search endpoints but lacks direct access to individual search backends. This adds POST /search/bm25, /search/vector, and /search/hybrid for callers who want fast, targeted searches without MCP protocol overhead. Also wires the `rerank` parameter through the existing POST /query endpoint so callers can pass `"rerank": false` to skip LLM reranking. The --warm flag pre-loads the embedding model on server startup so the first vector search is ~24ms instead of ~700ms cold. Opt-in to avoid surprising memory usage (~300MB for the embedding model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jmilinovich mentioned this pull request Mar 26, 2026

RFC: REST search endpoints on MCP HTTP server #471

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(http): REST search endpoints + --warm flag#469

feat(http): REST search endpoints + --warm flag#469
jmilinovich wants to merge 1 commit intotobi:mainfrom
jmilinovich:feat/rest-search-api

jmilinovich commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmilinovich commented Mar 26, 2026

Summary

Motivation

API

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant