Multi-provider search, AI answers, and universal URL fetching — aggregated, ranked, cached at the edge, and fully traced to R2.
Omnisearch MCP is a production-grade Model Context Protocol server running on Cloudflare Workers + Durable Objects. It fans out queries to many search engines and AI answer providers in parallel, fetches content from any URL via a deep multi-provider waterfall, returns unified results with global KV caching, and writes every request/response to R2 for incident debugging.
| Surface | Single Provider | Omnisearch |
|---|---|---|
| Resilience | Single point of failure | Many providers with automatic failover |
| Search | One engine's blind spots | Multiple engines in parallel, RRF-ranked, cross-deduplicated |
| Snippets | Engine-supplied (often boilerplate / SEO) | Top-N regenerated from actual page content via Groq, query-framed |
| AI Answers | One model's perspective | Consensus across multiple AI providers with citations |
| URL Fetching | Blocked by paywalls, CAPTCHAs | Multi-provider waterfall with social media extraction |
| Performance | Cold on every call | Global KV cache (36h TTL) — cache hits return in ~80ms |
| Connectivity | Timeout on long operations | SSE keepalive with event-boundary buffering, whitespace-heartbeat tolerant |
| Observability | Opaque | R2-backed request/response tracing, hive-partitioned per tool |
omnisearch/
├── README.md # this file
├── AGENTS.md # repo navigation hub for AI agents
├── LICENSE
├── package.json # @modelcontextprotocol/sdk ^1.29, agents, p-retry, zod, wrangler, typescript ^6.0.3
├── wrangler.toml # CF Worker config — DO + KV + R2 bindings; nodejs_compat flag
├── tsconfig.json
├── config.yaml # documentation copy of the fetch waterfall — runtime mirror is in fetch_orchestrator.ts
├── .env.example # documented env vars (secrets live in Doppler / wrangler secrets)
├── docs/ # postmortems, ROI analyses, multi-reviewer synthesis docs
│ ├── kimi-search-roi-analysis.md
│ ├── mcp-empty-payload-anomaly.md
│ ├── mcp-empty-payload-anomaly-v02.md
│ └── skip_providers_review_synthesis.md
└── src/
├── worker.ts # CF Worker fetch entry; OmnisearchMCP DO export; SSE keepalive injection
├── common/
│ ├── html.ts # extract_html_title / extract_markdown_title
│ ├── http.ts # http_json / http_text wrappers (5MB size guard, redaction, R2 trace hook)
│ ├── logger.ts # JSON logger with AsyncLocalStorage<request_id>
│ ├── r2_trace.ts # TraceContext + AsyncLocalStorage<WaitUntilCapable> + R2 hive-partitioned writer
│ ├── rrf_ranking.ts # Reciprocal Rank Fusion + URL dedup + tail rescue + quality filter
│ ├── search_operators.ts # parses site:/filetype:/intitle:/etc. into structured params
│ ├── snippet_selector.ts # bigram Jaccard + greedy sentence merge (collapse_snippets)
│ ├── types.ts # SearchResult, FetchResult, SearchProvider, FetchProvider, ProviderError
│ └── utils.ts # AbortSignal.any polyfill, hash_key (SHA-256), retry_with_backoff, REST auth
├── config/
│ └── env.ts # config object + initialize_config(env) + validate_config()
├── providers/
│ ├── index.ts # initialize_providers() — atomic-swap registries
│ ├── ai_response/ # AI answer providers (5 named + 4 LLM bridge sub-providers + 1 special gemini-grounded)
│ ├── fetch/ # 28 URL fetch providers — markdown/text/structured extraction
│ ├── search/ # 11 web-search providers — query → ranked SearchResult[]
│ └── unified/ # provider-abstraction dispatchers — auto-built from registrations
├── server/
│ ├── answer_orchestrator.ts # parallel AI fanout + 295s deadline + AbortController + KV cache + gemini-grounded inline
│ ├── fetch_orchestrator.ts # waterfall + domain breakers + parallel multi-winner racing + KV cache + skip_providers
│ ├── grounded_prompts.ts # snippet-writer system prompt + junk + sentinel detectors
│ ├── grounded_snippets.ts # post-RRF Groq grounding stage — bounded worker pool + per-URL deadline + retry path
│ ├── handlers.ts # MCP resource handlers (provider-status / provider-info)
│ ├── rest_fetch.ts # POST /fetch
│ ├── rest_researcher.ts # GET|POST /researcher (GPT-Researcher compat)
│ ├── rest_search.ts # POST /search (accepts grounded_snippets opt-out)
│ ├── tools.ts # MCP tool registration; per-DO get_ctx capture (R4F01)
│ └── web_search_fanout.ts # parallel search fanout + RRF + dedup + tail rescue + grounded-snippets stage + KV cache
└── types/
├── env.ts # CF env binding types (KV, R2, DO, secret env vars)
└── node-async-hooks.d.ts # minimal type stub for AsyncLocalStorage in workerd
A tools/ directory at the repo root holds offline harnesses (grounding_smoke.py, grounding_compare.py, grounding_lib.py) for before/after evaluation of the grounded-snippets feature against fixed query corpora. Not deployed.
Each subfolder has its own AGENTS.md with a file-by-file breakdown — start at AGENTS.md (root) for the navigation hub.
Fans out the same query to all configured search engines simultaneously. Deduplicates by URL (lowercase host + strip fragment + strip trailing slash), ranks with Reciprocal Rank Fusion (score = sum 1/(60 + rank)), collapses multi-provider snippets via bigram Jaccard plus greedy sentence-level set cover, and rescues high-quality results from underrepresented domains in the tail.
After ranking — if GROQ_API_KEY is set — a grounded-snippet stage kicks in: the top-20 URLs are fetched in parallel through the same waterfall the fetch tool uses, and Groq (openai/gpt-oss-120b) writes a query-framed snippet describing what each page actually says. The grounded snippet replaces the engine-supplied one. Each result reports snippet_source ∈ { aggregated, grounded, fallback } so callers can see which path produced the snippet. Default ON; set grounded_snippets:false per call (or omit GROQ_API_KEY) to return raw aggregated provider snippets at minimum latency.
Tool input: query, optional timeout_ms (omitted = wait for all providers, full dedup), optional include_snippets (default true), optional grounded_snippets (default true when GROQ_API_KEY is configured).
Queries every configured AI provider in parallel — each independently searches the web and synthesizes its own answer with citations. When a Gemini-grounded key is configured AND a search provider exists, an extra gemini-grounded task feeds web_search_fanout results into Gemini's url_context tool. Returns all answers in one response so callers can compare consensus vs. divergence.
Hard global deadline 295 seconds (4 minutes 55 seconds); pending providers are aborted via AbortController and reported as Timed out (global deadline). Only complete fanouts (zero failed/timed-out providers) are cached — partial results would otherwise lock a one-provider-short answer in for 36 hours.
Tool input: query only.
Deep waterfall that resolves clean content from any public URL:
- Domain breakers (in order): GitHub URLs to the
githubprovider, YouTube URLs tosupadata(transcripts), social-media URLs tosociavault(Reddit, Twitter/X, LinkedIn, TikTok, Instagram, Facebook, Threads, Pinterest, etc.). - Waterfall (top-to-bottom, configured in
fetch_orchestrator.ts CONFIG.waterfall—config.yamlis documentation only):- solo: tavily, then firecrawl, then kimi
- parallel: linkup + cloudflare_browser
- parallel: diffbot + olostep
- parallel: scrapfly + scrapedo + decodo
- solo: zyte, then brightdata
- sequential fallback: jina, spider, you, scrapeless, scrapingbee, scrapegraphai, scrappey, scrapingant, oxylabs, scraperapi, leadmagic, opengraph
- Failure detection: rejects any result with content under 200 chars (except API-native providers
githubandsupadata) or matching one of the challenge patterns (captcha,just a moment,cf-browser-verification,access denied, etc.).
Tool inputs:
url(required)skip_providers— comma-separated string OR JSON-encoded array OR native string-array. Triggers a 2-provider fanout (returning the second underalternative_resultsfor cross-provider validation), bypasses the cache, and roughly doubles cost+latency. Validated against the active provider set; typos are rejected at the MCP and REST layers, with a defense-in-depth pass at orchestrator entry.
skip_providers exists ONLY on the fetch tool. The answer and web_search tools do not accept it.
# 1. Install
git clone https://github.com/cjangrist/omnisearch.git
cd omnisearch
npm ci
# 2. Set Cloudflare secrets — at minimum one search/answer/fetch key each
npx wrangler secret put TAVILY_API_KEY
npx wrangler secret put FIRECRAWL_API_KEY
# ... see "Configuration" below for the full list
# 3. Optional: protect REST endpoints behind a Bearer token
npx wrangler secret put OMNISEARCH_API_KEY
# 4. Deploy
npm run deployConfigure your MCP client:
{
"mcpServers": {
"omnisearch": {
"url": "https://your-worker.workers.dev/mcp"
}
}
}The public deployment at https://omnisearch-mcp.cjangrist.workers.dev/mcp is unauthenticated for MCP — call it directly without an API key.
| Path | Method | Purpose | Auth |
|---|---|---|---|
/health, / |
GET | Liveness + active-provider count | none |
/search |
POST | Search fanout, returns [{ link, title, snippet }]. Body accepts grounded_snippets:false to skip the Groq grounding stage. |
Bearer (if configured) |
/fetch |
POST | URL fetch, returns full extraction with alternative_results when skip_providers set |
Bearer (if configured) |
/researcher |
GET or POST | GPT-Researcher compatible — returns [{ href, body }] (search snippets, no full fetch) |
Bearer or ?api_key= query param |
/mcp |
GET, POST, DELETE | MCP Streamable HTTP transport (delegates to Durable Object) | none (CORS wildcard) |
REST endpoints (/search, /fetch, /researcher) are gated by a Bearer token if either OMNISEARCH_API_KEY or OPENWEBUI_API_KEY is set (the legacy alias is honored — OPENWEBUI_API_KEY || OMNISEARCH_API_KEY). If neither is set, REST endpoints are open. Comparison uses crypto.subtle.timingSafeEqual to prevent timing attacks. The /researcher endpoint also accepts ?api_key=... as a query parameter for GPT-Researcher compatibility.
# Search
curl -X POST https://your-worker.workers.dev/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OMNISEARCH_API_KEY" \
-d '{"query": "vector database comparison", "count": 10}'
# Fetch (with cross-provider validation — bypasses cache, returns 2 results)
curl -X POST https://your-worker.workers.dev/fetch \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OMNISEARCH_API_KEY" \
-d '{"url": "https://www.linkedin.com/posts/...", "skip_providers": "tavily"}'
# Researcher (compatible with RETRIEVER=custom in GPT-Researcher)
curl "https://your-worker.workers.dev/researcher?query=rust+memory+safety&api_key=$OMNISEARCH_API_KEY"
# Health — actual provider count varies by which keys are configured
curl https://your-worker.workers.dev/health
# {"status":"ok","name":"omnisearch-mcp","version":"1.0.0","providers":47}Secrets are managed via wrangler secret put (or the Doppler-based workflow described in .env.example). wrangler.toml only contains bindings — no secret values.
Set at least one. Several keys are shared with the corresponding answer / fetch provider.
| Variable | Provider | Notes |
|---|---|---|
TAVILY_API_KEY |
Tavily | Shared with tavily_answer + tavily fetch |
BRAVE_API_KEY |
Brave | Brave Web Search (separate key from BRAVE_ANSWER_API_KEY) |
KAGI_API_KEY |
Kagi | Shared with kagi_fastgpt |
EXA_API_KEY |
Exa | Neural search; shared with exa_answer |
SERPAPI_API_KEY |
SerpAPI | google_light engine; key reused for fetch (YouTube transcripts) |
LINKUP_API_KEY |
Linkup | Shared with linkup fetch |
FIRECRAWL_API_KEY |
Firecrawl | Shared with firecrawl fetch |
YOU_API_KEY |
You.com | Shared with you fetch |
PERPLEXITY_API_KEY |
Perplexity | Sonar; shared with perplexity answer |
KIMI_API_KEY |
Kimi | Currently disabled by default per ROI analysis — see docs/kimi-search-roi-analysis.md. Requires SCRAPFLY_API_KEY too. |
| Variable | Provider |
|---|---|
PERPLEXITY_API_KEY |
Perplexity Sonar Pro (1024 max_tokens) |
KAGI_API_KEY |
Kagi FastGPT (~900ms typical) |
EXA_API_KEY |
Exa Answer (livecrawl=fallback) |
BRAVE_ANSWER_API_KEY |
Brave Answer (SSE streaming with inline citation tags) |
TAVILY_API_KEY |
Tavily Answer (include_answer=advanced) |
LLM_SEARCH_BASE_URL + LLM_SEARCH_API_KEY |
OpenAI-compatible bridge — registers 4 separate provider entries (chatgpt, claude, gemini, kimi) sharing the same endpoint. Optional model overrides: LLM_SEARCH_CHATGPT_MODEL, LLM_SEARCH_CLAUDE_MODEL, LLM_SEARCH_GEMINI_MODEL, LLM_SEARCH_KIMI_MODEL. |
GEMINI_GROUNDED_API_KEY |
Native Gemini API with url_context tool — invoked specially: receives web_search_fanout results as grounding sources. Optional GEMINI_GROUNDED_MODEL. |
Several keys are shared with search:
| Variable | Provider | Notes |
|---|---|---|
TAVILY_API_KEY |
Tavily Extract | shared |
FIRECRAWL_API_KEY |
Firecrawl /v2/scrape onlyMainContent |
shared |
JINA_API_KEY |
Jina Reader | token-efficient |
YOU_API_KEY |
You.com Contents | shared |
BRIGHT_DATA_API_KEY (+ optional BRIGHT_DATA_ZONE, default unblocker) |
Bright Data Web Unlocker | |
LINKUP_API_KEY |
Linkup /v1/fetch |
shared |
DIFFBOT_TOKEN |
Diffbot Article API | structured extraction |
SOCIAVAULT_API_KEY |
SociaVault | social-media routing |
SPIDER_CLOUD_API_TOKEN |
Spider.cloud /scrape |
|
SCRAPFLY_API_KEY |
Scrapfly /scrape |
also used as residential proxy for Kimi |
SCRAPEGRAPHAI_API_KEY |
ScrapeGraphAI /v1/markdownify |
|
SCRAPE_DO_API_TOKEN |
Scrape.do | |
SCRAPELESS_API_KEY |
Scrapeless Web Unlocker | JS render |
OPENGRAPH_IO_API_KEY |
OpenGraph.io Extract | structured tag extraction |
SCRAPINGBEE_API_KEY |
ScrapingBee | |
SCRAPERAPI_API_KEY |
ScraperAPI | |
ZYTE_API_KEY |
Zyte automatic extraction | |
SCRAPINGANT_API_KEY |
ScrapingAnt /v2/markdown LLM-ready |
|
OXYLABS_WEB_SCRAPER_USERNAME + OXYLABS_WEB_SCRAPER_PASSWORD |
Oxylabs Realtime | |
OLOSTEP_API_KEY |
Olostep | JS rendering + residential by default |
DECODO_WEB_SCRAPING_API_KEY |
Decodo (Smartproxy) | base64-encoded user:pass |
SCRAPPEY_API_KEY |
Scrappey headless browser | returns innerText |
LEADMAGIC_API_KEY |
LeadMagic Web2MD | |
CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_EMAIL + CLOUDFLARE_API_KEY |
Cloudflare Browser Rendering | all three required to register |
SERPAPI_API_KEY |
SerpAPI YouTube transcript engine | YouTube-only fetch; shared with search |
SUPADATA_API_KEY |
Supadata YouTube transcripts | sync/async (HTTP 202 + polling) dual-mode |
GITHUB_API_KEY |
GitHub fetcher | LLM-optimized REST + GraphQL multi-resource (issues, PRs, files, gists, releases, commits, repo overview) |
KIMI_API_KEY |
Kimi (Moonshot AI) coding-agent fetch | requires SCRAPFLY_API_KEY — routed via Scrapfly residential proxy because api.kimi.com WAF blocks Cloudflare-Workers ASN |
Optional. When set, the top-20 RRF-ranked URLs are fetched and re-summarized in parallel.
| Variable | Purpose |
|---|---|
GROQ_API_KEY |
Groq API key. Defaults: model openai/gpt-oss-120b, base https://api.groq.com/openai/v1, concurrency 3, per-URL deadline 15 s, page-body truncation 24 000 chars. Defaults are baked into src/config/env.ts's config.snippet_grounding.groq block — no other tunable env vars. Default ON when key is present; per-call opt-out via grounded_snippets:false. |
OMNISEARCH_API_KEY=your-secret-key-here
# legacy alias (still honored if set; takes precedence when both are set):
# OPENWEBUI_API_KEY=your-secret-key-here| Platform | API endpoint | Returns |
|---|---|---|
/v1/scrape/reddit/post/comments |
post + all comments | |
| Twitter/X | /v1/scrape/twitter/tweet |
tweet content + metadata |
| YouTube | /v1/scrape/youtube/video |
video info (transcripts come from Supadata's earlier breaker) |
/v1/scrape/facebook/post |
post content | |
/v1/scrape/instagram/post-info |
post info + media URLs | |
| TikTok | /v1/scrape/tiktok/video-info |
video info |
/v1/scrape/linkedin/post |
post text + author + engagement | |
| Threads | /v1/scrape/threads/post |
post content |
/v1/scrape/pinterest/pin |
pin content |
All three tools cache through Cloudflare KV with a 36-hour TTL (129,600 seconds) and SHA-256-hashed keys.
- Key shape:
<prefix>:<sha256(value)>where prefix issearch:,answer:, orfetch:. SHA-256 keeps every key under KV's 512-byte limit regardless of input length. - Search cache binds query:
make_cache_key(query, options)adds a\0sqf=truesuffix whenskip_quality_filteris set, plus a\0gnd=truesuffix when grounded snippets are active for the call, so raw / filtered / grounded / non-grounded fanouts don't collide. Defense-in-depth: cached entries also storequeryand are rejected ifcached.query !== query. - Answer cache binds query: same query-echo defense.
- Fetch cache binds requested URL: cached entries store
requested_url; the read path requirescached.requested_url === url. The MCP tool'sskip_providerspath bypasses the cache entirely and skips the cache write — preserving the multi-provider compare semantic (returning two different providers underalternative_resultswould otherwise pollute the singular cache). - Validators (
is_valid_cached_*in each orchestrator): full structural validation on every read. A legacy or corrupted entry is silently treated as a miss so downstream code can never crash on undefined fields. - Partial fanouts are NOT cached. If any provider failed or timed out, search and answer skip the write — locking in a one-provider-short result for 36h would prevent recovery once the upstream comes back.
- Cache hits return in ~80ms regardless of edge location.
Cloudflare Workers is a single isolate handling many concurrent requests. Several invariants flow from that:
- Per-request request_id:
run_with_request_id(uuid, fn)fromcommon/logger.tsscopes the ID throughAsyncLocalStorage. Allloggers.*instances pull from the store, so log lines never cross-contaminate between concurrent requests. - Per-request execution context:
r2_trace.tsexposesrun_with_execution_context(ctx, fn)backed byAsyncLocalStorage<WaitUntilCapable>.flush_background()reads the active ctx and callsctx.waitUntil(write_promise), so R2 trace writes always attach to the originating request — not to whatever request happens to be active at flush time. - Per-trace context:
run_with_trace(trace, fn)similarly scopes aTraceContextso providers can callget_active_trace()?.record_*without threading the context through every function. - Durable Object (
OmnisearchMCP) ctx scoping (R4F01): each MCP tool callback closure captures its ownget_ctxgetter —register_tools(server, () => this.ctx)— instead of pulling from a shared field on the registry. Reason:registryis module-scoped and shared across DO instances in the same isolate, so a per-DO ctx field on the registry would be overwritten by the most recent registration. Capturingget_ctxper-closure keeps everyMcpServer's handlers bound to the DO that registered them. Without this, MCP-invoked tool R2 traces become detached fire-and-forget promises that the runtime can cancel when the DO finishes its current invocation. - Atomic provider registry swap:
initialize_providers()builds new Sets locally, then assigns them toactive_providersin one statement. Concurrent reads never see an empty state. - Init memoization: both the Worker fetch path (
ensure_rest_initialized) and the DO (init) memoize a successful initialization promise, but reset it toundefinedon rejection so a transient secret-load failure doesn't permanently brick the isolate. - Cancellation: search providers, AI providers, and the fanouts all accept
AbortSignaland combine the external signal with deadline signals viaAbortSignal.any(with a polyfill inutils.ts/make_signal). When the answer fanout's 295s deadline fires, the orchestrator callsabort_controller.abort()to cancel in-flight provider HTTP requests; late-arriving promises are guarded with anis_doneflag so they can't mutate result arrays after the deadline. nodejs_compatflag: required inwrangler.tomlsonode:async_hooksAsyncLocalStorage is provided by workerd. Without it, request-id and execution-context scoping silently no-op, leading to log cross-contamination and orphaned R2 trace writes.
/mcp POST responses are streamed back as SSE. The agents package's DO transport doesn't emit keepalives. The Worker injects event: ping\ndata: keepalive\n\n every 5s only between complete events — the buffer is checked via buffer_is_only_whitespace() and pings interleave when the buffer is empty OR contains only SSE whitespace bytes (space / tab / LF / CR), so upstream / proxy whitespace heartbeats can't suppress every keepalive ping forever. (Pre-2026-05-04 the gate was a bare total_len === 0 check; a single bare \n byte from any intermediate layer accumulated forever, suppressed pings, and let Cloudflare's edge tear the connection down client-side.) When the buffer is whitespace-only it's flushed first — so heartbeats are forwarded to the client AND our pings keep firing on schedule. Partial events containing any non-whitespace byte still gate the ping (no mid-event corruption). Boundary detection is WHATWG-compliant (\n\n, \r\n\r\n, \r\r); chunks are kept as a list and only flattened when scanning, avoiding O(n^2) Uint8Array copies. A write lock (safe_write) serializes pump writes against interval pings.
Every request to the three tools (and every nested HTTP call inside them) is captured to TRACE_BUCKET (R2) as a single pretty-formatted JSON document. Path layout:
request_traces/tool={search|answer|fetch|web_search}/date=YYYY-MM-DD/hour=HH/trace_id=<uuid>.json
Each document includes:
trace_id,parent_trace_id(e.g. answer fanout's gemini-grounded child links to the inner web_search trace)tool,started_at,completed_at,total_duration_ms,cache_hitrequest_environment(query/url/options)orchestrator.{strategy, active_providers, decisions[]}— full decision log (waterfall_step,breaker_match,cache_hit,fanout_complete, etc.)providers_hit,providers_succeeded,providers_failedproviders.<name>.{started_at, duration_ms, success, input, output, error, http_calls[]}— full HTTP round-trip incl. request headers, request body, response headers, response body, response_size_bytesfinal_result— what was returned to the client
Sensitive query params (api_key, key, token, app_id, x-api-key, apikey) are redacted in log output (http.ts/sanitize_url) but NOT redacted in the R2 trace payload — the bucket is private and used for incident debugging.
flush_background() is fire-and-forget; attached to ctx.waitUntil when an active execution context is available, dropped silently otherwise.
+---------------------------------------------+
| Cloudflare Worker |
| |
| /health /search /fetch /researcher /mcp|
| | | | | | |
| | +----+--+---------+ | |
| | | | |
| | REST handlers Durable Object
| | (Bearer auth, OmnisearchMCP
| | CORS, body (per-session)
| | size guards) | |
| +-------------+------------------+ |
| | |
| +--------+--------+ |
| | KV Cache (36h) | |
| +--------+--------+ |
| | miss |
| +-------------+-------------+ |
| v v v |
| Web Search AI Answer Fetch Race |
| Fanout Fanout Waterfall |
| | | | |
| v v v |
| RRF Rank Deadline + Domain |
| + Snippet AbortCtrl Breakers + |
| Collapse + Gemini Multi-Winner |
| | Grounded Parallel Race |
| v | | |
| Groq Ground | | |
| (top-N URLs) | | |
| | | | |
| +-------------+-------------+ |
| v |
| R2 Trace Bucket |
| (hive-partitioned per tool) |
+---------------------------------------------+
-
Reciprocal Rank Fusion —
score = sum 1/(60 + rank)per provider that returned the URL. No score calibration needed across providers with incompatible ranking schemes. Pages found by multiple engines float to the top. -
Snippet collapse — when 2+ providers return the same URL, snippets are scored on bigram density times query-term relevance times log-length. If the runner-up is meaningfully different (Jaccard < 0.3), greedy sentence-level set cover within a 500-char budget produces a merged snippet. Otherwise the best single snippet is kept.
-
Tail rescue — after taking top-N, results from underrepresented domains in the tail are rescued if their per-provider intra-rank is < 2. Prevents SEO-dominated top results from drowning out niche-domain authoritative sources.
-
Grounded snippets — engine-supplied snippets are notoriously SEO-padded, off-topic, or boilerplate. After RRF picks the top-20, each URL is fetched through the same waterfall the
fetchtool uses (concurrency-capped to 3, per-URL hard deadline 15 s) and Groq (openai/gpt-oss-120b) writes a query-framed snippet from the actual page body. Pre-Groq pattern detection (paywall / login-wall / cookie-wall / JS-shell / bot-challenge) and post-Groq sentinel detection ([no usable content],[navigation only],[page not found],[search results page],[login required]) trigger a single retry through the waterfall withskip_providers={attempt-1 winner}— the search engines have already vouched for relevance, so a second fetcher is more useful than re-asking the model. Failures fall back transparently to the original aggregated snippet (snippet_source: 'fallback') so a single bad URL never breaks the result set. Failure outcomes are classified into 8 buckets and reported via a singlegrounding_aggregatelog line per call (grounded ratio, p50/p95/max latency, provider wins, retry count, timeout count). The model choice (120B over 20B) is deliberate — the 20B emitted degenerate-sampling output under detailed-prompt + 6k-token-context load. -
Multi-provider fanout IS the redundancy strategy —
answer_orchestratordeliberately doesn't retry individual providers (retry_with_backoffis NOT used here). Retrying doubles worst-case latency for rare gains; the fanout already has redundancy. -
Multi-winner parallel race in fetch — when
skip_providerstriggerstarget_count = 2, parallel steps collect successes up to the target rather thanPromise.any-ing the first. Once settled, late-arriving rejections are dropped from bothctx.failedand the trace — the public response and the trace tell the same story. -
Empty-active-set guard — if every active fetch provider was filtered out by
skip_providers, the orchestrator throwsINVALID_INPUT(REST → 400) instead of running the waterfall to exhaustion and emitting a misleading 502 with"Tried: <empty>". -
Failure isolation — every provider runs in its own promise. One provider's exception never crashes the others. Partial results are always returned with
providers_failedmetadata. -
All-failed → 502, not 200: REST
/searchand/researcherreturn 502 with{ error, failed_providers }when every provider failed.
The codebase uses a registry pattern. Adding a search provider:
- Add env binding in
src/types/env.ts - Add a config entry in
src/config/env.ts(config.search.<name>) and wire it ininitialize_config(env) - Implement adapter in
src/providers/search/<name>/index.tsexporting:class <Name>SearchProvider implements SearchProviderexport const registration = { key: () => config.search.<name>.api_key };
- Register — one line in
src/providers/unified/web_search.ts'sPROVIDERSarray - Add to
.env.examplefor documentation
The registry auto-detects availability from environment variables; only providers with non-empty trimmed keys are instantiated. Same flow for ai_response/ and fetch/. For fetch, also consider whether the new provider needs a slot in fetch_orchestrator.ts CONFIG.waterfall (and a parallel update to config.yaml if the YAML is meant to stay in sync as documentation).
npm run typecheck # tsc --noEmit
npm run dev # wrangler dev (port 8787)
npm run deploy # wrangler deployThere is no in-repo test framework — verification is done via the live REST endpoints, R2 trace inspection, and the multi-reviewer hydra-heads workflow (see docs/skip_providers_review_synthesis.md for an example).
# Health
curl http://localhost:8787/health
# REST search (no auth required when OMNISEARCH_API_KEY is unset)
curl -X POST http://localhost:8787/search \
-H "Content-Type: application/json" \
-d '{"query": "rust memory safety", "count": 5}'
# REST fetch
curl -X POST http://localhost:8787/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://www.reddit.com/r/LocalLLaMA/..."}'
# REST fetch with cross-provider validation (returns alternative_results)
curl -X POST http://localhost:8787/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article", "skip_providers": "tavily,firecrawl"}'
# /researcher with query-param auth
curl "http://localhost:8787/researcher?query=test&api_key=$OMNISEARCH_API_KEY"R2 traces are queryable via the partition path — list by tool, date, hour for any incident.
https://omnisearch-mcp.cjangrist.workers.dev/mcp is open and unauthenticated; call directly without an API key.
- Empty MCP envelope under high concurrency — see
docs/mcp-empty-payload-anomaly.md. Under 3+ concurrent long-running answer calls, ~20% return empty JSON-RPC envelopes. Workaround: serial calls. Not fixed at time of writing.
This project is inspired by and based on the work of Scott Spence and the original mcp-omnisearch.
MIT — see LICENSE.