Add CLI local model support by lewtun · Pull Request #228 · huggingface/ml-intern

lewtun · 2026-05-05T16:39:08Z

Summary

Adds CLI-only support for local OpenAI-compatible model servers through provider-specific model IDs:

ollama/<model>
vllm/<model>
lm_studio/<model>
llamacpp/<model>

This keeps the web/backend model catalog unchanged and intentionally does not add the generic openai-compat/ escape hatch yet.

Background / lineage

This PR takes the narrow CLI routing path from the local-model discussion and builds on ideas from prior PRs:

References Add local model support via Ollama #44 for the Ollama-first CLI support and HF-token startup behavior.
References Add local OpenAI-compatible backend routing #68 for the small LiteLLM OpenAI-compatible routing shape.
References Improved local model support #166 for local-model feature-gating concerns and strict reasoning-effort handling.

It addresses the CLI slice of #94 without adding frontend/API local model selection.

Usage examples

Start the local inference server first, then select it with a provider-specific prefix:

ml-intern --model ollama/llama3.1:8b "summarize this repo"
ml-intern --model vllm/meta-llama/Llama-3.1-8B-Instruct "inspect the training script"

In an interactive session:

/model ollama/llama3.1:8b
/model lm_studio/google/gemma-3-4b
/model llamacpp/llama-3.1-8b-instruct

Use one shared local endpoint for any local prefix:

LOCAL_LLM_BASE_URL=http://localhost:8000
LOCAL_LLM_API_KEY=optional-local-key
ml-intern --model vllm/custom-model "inspect the training script"

Or override a specific provider when running multiple local servers:

OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000
LMSTUDIO_BASE_URL=http://127.0.0.1:1234
LLAMACPP_BASE_URL=http://localhost:8080

Base URLs may include or omit /v1. Local API keys are optional; when unset, local backends get a non-empty placeholder key by default.

Local endpoint precedence:

base URL:  <PROVIDER>_BASE_URL -> LOCAL_LLM_BASE_URL -> provider default
api key:   <PROVIDER>_API_KEY  -> LOCAL_LLM_API_KEY  -> placeholder

OPENAI_BASE_URL is intentionally left for openai/... models only. For example:

OPENAI_BASE_URL=http://localhost:8000/v1 ml-intern --model openai/my-local-model

Implementation notes

Local model IDs route through LiteLLM as openai/<model> with the configured provider base URL.
Local providers do not receive reasoning_effort; strict mode raises UnsupportedEffortError, while normal runtime drops it.
/model skips HF Router catalog lookup for local IDs and performs a cheap no-effort probe before switching.
CLI startup no longer requires an HF token when the selected model is local. HF-backed tools still need an HF token to act on the Hub.
Provider-specific local env vars take precedence over shared LOCAL_LLM_* vars, and OPENAI_BASE_URL is not reused for local prefixes.

Validation

uv run pytest tests/unit/test_llm_params.py tests/unit/test_cli_local_models.py
uv run pytest tests/unit
uv run ruff check .
uv run ruff format --check .

Co-authored-by: OpenAI Codex <codex@openai.com>

github-actions · 2026-05-05T16:39:35Z

Claude finished @lewtun's task in 4m 6s —— View job

PR Review: Add CLI local model support

Read CLAUDE.md and REVIEW.md for repo guidelines
Read changed files and trace routing logic
Analyze test coverage
Post review findings

0 P0, 2 P1
Verdict: ready to merge

What I checked

Full routing chain in _resolve_llm_params (precedence: anthropic → bedrock → openai → reserved → local → HF router)
is_valid_model_id edge cases: empty prefix, whitespace in model name, reserved openai-compat/ path
_probe_local_model exception handling and timeout behavior
_commit_switch call site for local models vs. non-local paths
HF token gating logic in both main() and headless_main() (ordering fix)
URL normalization in _local_api_base
Test coverage for success/failure paths

P1 — Docstring mismatch on transient-error behavior

model_switcher.py:205–214:

Transient errors (5xx, timeout) complete the switch with a yellow
warning; the next real call re-surfaces the error if it's persistent.

This is true for the non-local path (ProbeInconclusive → yellow warning). But for local models (lines 218–228), all exceptions — including asyncio.TimeoutError and transient 5xx — reject the switch and print [bold red]Switch failed[/bold red]. The two paths have opposite semantics on transient errors, and the docstring only documents one of them.

Fix this →

P1 — No test for probe failure path

test_cli_local_models.py tests only the success path (fake_acompletion returns object()). There is no test verifying that:

a probe exception (e.g., ConnectionRefusedError, asyncio.TimeoutError) leaves session.model_id and config.model_name unchanged
session.model_effective_effort is not updated on failure

"Keeps current model on hard error" is an explicitly advertised behavior in the PR description and is worth a second async test. Example sketch:

@pytest.mark.asyncio
async def test_probe_and_switch_local_model_rejects_on_connection_error(monkeypatch):
    async def failing_acompletion(**kwargs):
        raise ConnectionRefusedError("no server")

    monkeypatch.setattr(model_switcher, "acompletion", failing_acompletion)

    class Config:
        model_name = "openai/gpt-5.5"
        reasoning_effort = None

    session = Session()
    await model_switcher.probe_and_switch_model(
        "ollama/llama3.1:8b", Config(), session, Console(), hf_token=None
    )

    assert session.model_id is None  # no switch
    assert "ollama/llama3.1:8b" not in session.model_effective_effort

Fix this →

P2 — Redundant triple validation in `_resolve_local_model_params`

llm_params.py:125–128:

provider = local_model_provider(model_name)
local_name = local_model_name(model_name)
if provider is None or local_name is None or not is_local_model_id(model_name):
    raise ValueError(...)

By the time this function is called (from _resolve_llm_params:239 — after an explicit local_model_provider(model_name) is not None guard), provider can never be None here. is_local_model_id calls local_model_name internally, making local_name is None and not is_local_model_id(...) redundant with each other. The function is safe, just confusing for future readers. A single if local_name is None guard (covering the empty-prefix case like "ollama/") is sufficient.

No blocking issues. The routing logic is correct, the HF token gating order fix in both main() and headless_main() is clean, and the _DIRECT_PREFIXES expansion properly keeps local model IDs out of the HF router catalog lookup.

Resolved README usage docs conflict by keeping Claude 4.7 guidance from main and local model examples from this branch. Co-authored-by: OpenAI Codex <codex@openai.com>

github-actions · 2026-05-06T07:11:03Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-05-06T09:22:44Z

Claude finished @lewtun's task in 2m 11s —— View job

PR Re-Review: Add CLI local model support

0 P0
Verdict: ready to merge

This is a re-trigger after merges from main into the PR branch. The PR's own files (agent/core/llm_params.py, agent/core/local_models.py, agent/core/model_switcher.py, agent/main.py, the new tests) were not modified since the prior Claude review on bb56de6 — the diff between then and HEAD is only changes pulled in from main (backend/dependencies.py, backend/routes/auth.py, etc., from PRs #229 and #227 which are already merged on main).

What I checked

Confirmed the PR-local files are byte-identical to the prior-reviewed commit (git diff bb56de6..HEAD covers only merged-from-main paths).
Re-traced the full routing chain: _resolve_llm_params precedence (anthropic → bedrock → openai → reserved → local → HF router), _probe_local_model exception handling, _commit_switch for local vs. non-local, HF token gating in main() and headless_main().
Verified the heal-effort path in agent_loop._heal_effort_and_rebuild_params → effort_probe.probe_effort cascade-exhaustion behavior is safe for local models (every level raises UnsupportedEffortError synchronously, exhaustion returns effective_effort=None cleanly at effort_probe.py:279).
Spot-checked test coverage in test_cli_local_models.py and test_llm_params.py.

No new P0 issues. Per REVIEW.md re-review convergence rules, P1s previously flagged (transient-error docstring mismatch, missing probe-failure test) are not re-flagged here — those are deferrable at the author's discretion.
| Branch

Support LOCAL_LLM_BASE_URL and LOCAL_LLM_API_KEY as shared fallbacks while preserving provider-specific local overrides. Co-authored-by: OpenAI Codex <codex@openai.com>

…tern into codex/cli-local-models

github-actions · 2026-05-06T09:29:43Z

Claude finished @lewtun's task in 1m 22s —— View job

PR Re-Review: Add CLI local model support

0 P0
Verdict: ready to merge

This re-review covers commit 8d21bf1 ("Add shared local model endpoint fallback"), which is the only PR-local change since the prior review on bb56de6. The other commits in the range are merges from main.

What I checked

Precedence chain in _resolve_local_model_params (agent/core/llm_params.py:132-141): provider-specific env → LOCAL_LLM_BASE_URL/LOCAL_LLM_API_KEY → built-in default. Order matches the README claim "Provider-specific variables take precedence over the shared local variables."
Empty-string handling: os.environ.get(...) or ... falls through on empty strings, so an unset or empty LOCAL_LLM_BASE_URL lands on the provider default — safe.
New constants LOCAL_MODEL_BASE_URL_ENV / LOCAL_MODEL_API_KEY_ENV in agent/core/local_models.py:28-29 are imported and used only in llm_params.py; no stale grep hits.
Test coverage:
- test_resolve_lm_studio_params_uses_api_key_override (tests/unit/test_llm_params.py:59-69) now sets both LMSTUDIO_* and LOCAL_LLM_* and asserts the provider-specific values win for both api_base and api_key — exercises the precedence direction.
- test_resolve_local_params_uses_shared_fallback_env (tests/unit/test_llm_params.py:72-82) deletes the provider-specific envs, sets only the shared fallback, and asserts both api_base and api_key come from LOCAL_LLM_* — exercises the fallback direction.
README diff (README.md:28-32, 86-92) matches behavior: documents shared fallback as the recommended single-endpoint setup, calls out provider override precedence, retains the *_BASE_URL/*_API_KEY form for per-provider overrides.
Re-traced the full chain that consumes these params (_resolve_llm_params → litellm.acompletion via the agent loop and _probe_local_model) — no caller relies on the prior "provider env or default only" semantics, so the added fallback is a pure additive change.

No new P0 issues. Per REVIEW.md re-review convergence rules, the prior P1s (transient-error docstring mismatch in model_switcher.py, missing probe-failure test) are not re-flagged here — those remain at the author's discretion.
· Branch

Clarify local probe failure behavior, add regression coverage for rejected local switches, and simplify local model validation. Co-authored-by: OpenAI Codex <codex@openai.com>

lewtun · 2026-05-06T09:54:42Z

Addressed the review feedback from #228 (comment) in 925c29e:

Clarified probe_and_switch_model() docs so transient yellow-warning behavior is documented as non-local only; local probe failures reject the switch.
Added async coverage for local probe failure preserving the current model and leaving model_effective_effort untouched.
Simplified _resolve_local_model_params() validation to remove the redundant triple check.

Validation:

uv run pytest tests/unit/test_llm_params.py tests/unit/test_cli_local_models.py
uv run pytest tests/unit
uv run ruff check .
uv run ruff format --check .

github-actions · 2026-05-06T09:54:45Z

Claude finished @lewtun's task in 1m 37s —— View job

PR Re-Review: Add CLI local model support

0 P0
Verdict: ready to merge

This re-review covers commit 925c29e ("Address local model review feedback"), which is the only PR-local change since the prior review on 8d21bf1. All three prior items are addressed cleanly.

What I checked

Docstring fix (agent/core/model_switcher.py:213-216): The transient-error paragraph now explicitly says non-local-only and adds the "Local models reject every probe error, including timeouts, and keep the current model" clarification. Matches the actual local branch at model_switcher.py:218-225 (any exception → red Switch failed → return without _commit_switch).
New probe-failure test (tests/unit/test_cli_local_models.py:86-121): Monkeypatches acompletion to raise ConnectionRefusedError, then asserts config.model_name == "openai/gpt-5.5" (unchanged), session.model_id is None (no commit), and "ollama/llama3.1:8b" not in session.model_effective_effort (cache untouched). Exactly the coverage I asked for in the prior P1.
Validation simplification (agent/core/llm_params.py:126-131): Reduced to a single if local_name is None guard plus assert provider is not None. Sound — the call site at llm_params.py:249 only dispatches into _resolve_local_model_params when local_model_provider(model_name) is not None, so passing the local_name is None check (which only fails on the empty-suffix case like "ollama/") implies a prefix matched, hence provider cannot be None. The removed is_local_model_id arm was indeed redundant for this code path.
Re-traced the local routing chain end-to-end: _resolve_llm_params:246-250 (reserved-prefix reject → local dispatch) → _resolve_local_model_params:116-146 (env precedence: provider-specific → LOCAL_LLM_* → default) → _local_api_base URL normalization → litellm.acompletion. No regression introduced by the simplification.
HF-token gating at agent/main.py:977 and agent/main.py:1209: both still gate on is_local_model_id(config.model_name) for the startup-token requirement; that path is untouched by this commit.

No new P0 issues. Per REVIEW.md re-review convergence rules, the prior P1s have been resolved in this commit; nothing else to flag.
· Branch

github-actions · 2026-05-06T10:15:29Z

Claude finished @lewtun's task in 33s —— View job

PR Re-Review: Add CLI local model support

0 P0
Verdict: ready to merge

This re-review covers the merge commit e125b14, which brings main into the PR branch. The diff between 925c29e (last reviewed) and HEAD touches only files merged in from main (.github/workflows/claude-review.yml, .gitignore, scripts/prioritize_backlog.py, tests/unit/test_prioritize_backlog.py from PR #222) — all already merged on main and out of scope for this PR.

What I checked

git diff 925c29e..HEAD -- agent/core/llm_params.py agent/core/local_models.py agent/core/model_switcher.py agent/main.py tests/unit/test_cli_local_models.py tests/unit/test_llm_params.py README.md is empty: the PR-local files are byte-identical to the prior-reviewed commit.
Confirmed via git log --oneline 925c29e..HEAD that the only commits in the range are e125b14 (merge) and 6aebbdf (the merged PR Add backlog prioritization report tooling #222), neither of which alters the local-model routing surface.

No new findings. Per REVIEW.md re-review convergence rules, the prior P1s were resolved in 925c29e and are not re-flagged here.
· Branch

Add CLI local model support

bb56de6

Co-authored-by: OpenAI Codex <codex@openai.com>

Merge latest main into CLI local models

5a7e5a7

Resolved README usage docs conflict by keeping Claude 4.7 guidance from main and local model examples from this branch. Co-authored-by: OpenAI Codex <codex@openai.com>

Merge branch 'main' into codex/cli-local-models

0cfc8ca

lewtun and others added 2 commits May 6, 2026 11:28

Add shared local model endpoint fallback

8d21bf1

Support LOCAL_LLM_BASE_URL and LOCAL_LLM_API_KEY as shared fallbacks while preserving provider-specific local overrides. Co-authored-by: OpenAI Codex <codex@openai.com>

Merge branch 'codex/cli-local-models' of github.com:huggingface/ml-in…

72e9cec

…tern into codex/cli-local-models

Address local model review feedback

925c29e

Clarify local probe failure behavior, add regression coverage for rejected local switches, and simplify local model validation. Co-authored-by: OpenAI Codex <codex@openai.com>

Merge branch 'main' into codex/cli-local-models

e125b14

lewtun merged commit 4668dbd into main May 6, 2026
3 checks passed

lewtun deleted the codex/cli-local-models branch May 6, 2026 10:27

lmassaron mentioned this pull request May 6, 2026

Add local model support via Ollama #44

Closed

lewtun mentioned this pull request May 7, 2026

Add CLI sandbox runtime and fix HF Jobs script paths #237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLI local model support#228

Add CLI local model support#228
lewtun merged 7 commits intomainfrom
codex/cli-local-models

lewtun commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026 •

edited

Loading

Uh oh!

lewtun commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lewtun commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background / lineage

Usage examples

Implementation notes

Validation

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add CLI local model support

What I checked

P1 — Docstring mismatch on transient-error behavior

P1 — No test for probe failure path

P2 — Redundant triple validation in _resolve_local_model_params

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Re-Review: Add CLI local model support

What I checked

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Re-Review: Add CLI local model support

What I checked

Uh oh!

lewtun commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Re-Review: Add CLI local model support

What I checked

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Re-Review: Add CLI local model support

What I checked

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lewtun commented May 5, 2026 •

edited

Loading

github-actions Bot commented May 5, 2026 •

edited

Loading

P2 — Redundant triple validation in `_resolve_local_model_params`

github-actions Bot commented May 6, 2026 •

edited

Loading

github-actions Bot commented May 6, 2026 •

edited

Loading

github-actions Bot commented May 6, 2026 •

edited

Loading

github-actions Bot commented May 6, 2026 •

edited

Loading

github-actions Bot commented May 6, 2026 •

edited

Loading