Add LM Studio provider (native /api/v1) by jasonwarta · Pull Request #2 · Pickle-Pixel/HydraMCP

jasonwarta · 2026-04-14T00:25:39Z

Summary

New LMStudioProvider targeting LM Studio's native REST API (/api/v1/models, /api/v1/chat) rather than the OpenAI-compat surface — so we get back tokens/sec, time-to-first-token, and model load time for diagnosing local-inference perf. Logged at debug level; doesn't leak LM Studio specifics into the provider-agnostic QueryResponse.
Local providers (Ollama + LM Studio) now register unconditionally instead of being gated on a one-shot boot-time health check. A local server that was down at startup used to stay dropped for the entire MCP process lifetime — now list_models/ask_model reach out live on each call (backed by the existing 30s cache), so a LAN/local server coming online mid-session just works.
3s AbortController timeout on LM Studio healthCheck/listModels so a dead server can't stall list_models calls.
Defaults to http://localhost:1234, override with LMSTUDIO_URL for LAN/remote instances.
Filters non-chat models (e.g. embeddings) via the type field from /api/v1/models.
Route explicitly with lmstudio/<model-key>; JIT-loading is delegated to LM Studio so not-loaded models load on first query.

Why the local-provider fix is in this PR

Hit the startup-only registration bug while testing (LAN LM Studio box was asleep at first boot; /mcp reconnect didn't respawn the process, had to claude mcp remove && add). Ollama has the identical pattern. Fixing only one would leave two different registration behaviors for conceptually identical providers, so both are fixed together. Cloud providers already register based on env-var presence alone, so this brings local in line with cloud.

Known limitations

`/api/v1/chat`'s message-array input shape wasn't reverse-engineerable from error messages on the probed instance (string `input` works; the array form expects a content-part discriminator the schema doesn't advertise). `system_prompt` is prepended to the single-string input as a framed prefix — works correctly for single-turn prompts, which is all HydraMCP tools currently do.
Context size is discovered from `loaded_instances[0].config.context_length` / `max_context_length`; we don't force a specific ctx on JIT load. Bump it in the LM Studio UI if you need larger ctx.

Test plan

`healthCheck()` against live LM Studio on the LAN returns true
`listModels()` returns 2 LLMs, filters out 1 embedding model via `type` field
`query()` against `mistralai/codestral-22b-v0.1` — PONG response, usage/latency reported
`query()` against `meta-llama-3.1-8b-instruct` — JIT-loads and responds, `model_load_time_seconds` captured
`system_prompt` + `temperature` + `max_tokens` all honored by the native endpoint
`npx tsc --noEmit` clean, `npm run build` clean
End-to-end through MCP client: `list_models` shows lmstudio section with both models
End-to-end through MCP client: `ask_model lmstudio/meta-llama-3.1-8b-instruct` → JIT-loads and responds in ~3.9s
End-to-end through MCP client: `ask_model lmstudio/mistralai/codestral-22b-v0.1` → evicts llama, loads codestral, responds in ~7.5s (one-model-at-a-time LM Studio setup)

🤖 Generated with Claude Code

New provider talks to LM Studio's native REST endpoints (/api/v1/models and /api/v1/chat) rather than the OpenAI-compat surface, so we surface local-inference detail (tokens/sec, ttft, model load time) at debug level for diagnosing perf on your own hardware. Auto-registers when LM Studio is reachable. Defaults to http://localhost:1234, override with LMSTUDIO_URL for LAN use. Filters embedding models via the `type` field from /api/v1/models. JIT-loading is delegated to LM Studio — not-loaded models load on first query. Route explicitly with the `lmstudio/` prefix. Known limitation: /api/v1/chat's message-array input shape isn't documented on the probed instance, so system_prompt is prepended to the single-string input form. Single-turn prompts (all HydraMCP tools today) work correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously Ollama and LM Studio were only registered if a one-shot boot-time health check succeeded. If the local server happened to be down or unreachable at startup, the provider was silently dropped for the entire lifetime of the MCP process — `/mcp` reconnect would not fix it; only a full Claude Code restart or `claude mcp remove && add` would. This surprised at least one user who had to remove+re-add to recover after waking a LAN LM Studio machine. Local servers restart independently of the MCP process, so gating registration on a boot-time check is the wrong shape. This aligns them with cloud providers, which register based on env-var presence alone. Now listModels and query reach out live on each tool call (backed by the existing 30s model-list cache) and Promise.allSettled in MultiProvider means unreachable providers just contribute no models. To keep `list_models` snappy when a provider is down, add a 3s AbortController timeout to LM Studio's healthCheck and listModels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The existing snippet only showed the bare shell env var, but the canonical way to pass config to a Claude Code MCP server is via \`claude mcp add -e\` (matches the Quick Start example at the top of the README). Document both so readers see the integration form first and the raw-env form as an alternative. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jasonwarta and others added 2 commits April 13, 2026 16:02

jasonwarta mentioned this pull request Apr 14, 2026

Add LM Studio provider (native /api/v1) jasonwarta/HydraMCP#1

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LM Studio provider (native /api/v1)#2

Add LM Studio provider (native /api/v1)#2
jasonwarta wants to merge 3 commits intoPickle-Pixel:mainfrom
jasonwarta:feat/lmstudio-provider

jasonwarta commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jasonwarta commented Apr 14, 2026

Summary

Why the local-provider fix is in this PR

Known limitations

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant