Add LM Studio provider (native /api/v1)#2
Open
jasonwarta wants to merge 3 commits intoPickle-Pixel:mainfrom
Open
Add LM Studio provider (native /api/v1)#2jasonwarta wants to merge 3 commits intoPickle-Pixel:mainfrom
jasonwarta wants to merge 3 commits intoPickle-Pixel:mainfrom
Conversation
New provider talks to LM Studio's native REST endpoints (/api/v1/models and /api/v1/chat) rather than the OpenAI-compat surface, so we surface local-inference detail (tokens/sec, ttft, model load time) at debug level for diagnosing perf on your own hardware. Auto-registers when LM Studio is reachable. Defaults to http://localhost:1234, override with LMSTUDIO_URL for LAN use. Filters embedding models via the `type` field from /api/v1/models. JIT-loading is delegated to LM Studio — not-loaded models load on first query. Route explicitly with the `lmstudio/` prefix. Known limitation: /api/v1/chat's message-array input shape isn't documented on the probed instance, so system_prompt is prepended to the single-string input form. Single-turn prompts (all HydraMCP tools today) work correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously Ollama and LM Studio were only registered if a one-shot boot-time health check succeeded. If the local server happened to be down or unreachable at startup, the provider was silently dropped for the entire lifetime of the MCP process — `/mcp` reconnect would not fix it; only a full Claude Code restart or `claude mcp remove && add` would. This surprised at least one user who had to remove+re-add to recover after waking a LAN LM Studio machine. Local servers restart independently of the MCP process, so gating registration on a boot-time check is the wrong shape. This aligns them with cloud providers, which register based on env-var presence alone. Now listModels and query reach out live on each tool call (backed by the existing 30s model-list cache) and Promise.allSettled in MultiProvider means unreachable providers just contribute no models. To keep `list_models` snappy when a provider is down, add a 3s AbortController timeout to LM Studio's healthCheck and listModels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9 tasks
The existing snippet only showed the bare shell env var, but the canonical way to pass config to a Claude Code MCP server is via \`claude mcp add -e\` (matches the Quick Start example at the top of the README). Document both so readers see the integration form first and the raw-env form as an alternative. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LMStudioProvidertargeting LM Studio's native REST API (/api/v1/models,/api/v1/chat) rather than the OpenAI-compat surface — so we get back tokens/sec, time-to-first-token, and model load time for diagnosing local-inference perf. Logged at debug level; doesn't leak LM Studio specifics into the provider-agnosticQueryResponse.list_models/ask_modelreach out live on each call (backed by the existing 30s cache), so a LAN/local server coming online mid-session just works.AbortControllertimeout on LM StudiohealthCheck/listModelsso a dead server can't stalllist_modelscalls.http://localhost:1234, override withLMSTUDIO_URLfor LAN/remote instances.typefield from/api/v1/models.lmstudio/<model-key>; JIT-loading is delegated to LM Studio so not-loaded models load on first query.Why the local-provider fix is in this PR
Hit the startup-only registration bug while testing (LAN LM Studio box was asleep at first boot;
/mcpreconnect didn't respawn the process, had toclaude mcp remove && add). Ollama has the identical pattern. Fixing only one would leave two different registration behaviors for conceptually identical providers, so both are fixed together. Cloud providers already register based on env-var presence alone, so this brings local in line with cloud.Known limitations
Test plan
🤖 Generated with Claude Code