Native Ollama provider extension for the pi coding agent.
Talks directly to Ollama's /api/chat endpoint, bypassing the OpenAI-compat shim at /v1/chat/completions that silently drops tool_calls from streamed responses.
I was simply trying to use the pi agent locally with ollama and this can of worms opened up. Seemed like a good learning opportunity and a great way to start trying to get more involved in the community. New to contributing to open source and build-in-public etiquette, so feedback genuinely welcome. I hope you find this useful!
Pi ships with an openai-completions adapter that routes Ollama traffic through Ollama's OpenAI-compat shim. The shim has a known streaming bug: tool_calls are dropped from the streamed deltas. Without those tool calls, pi's agent loop stalls on the first tool use — the model produces a tool call, the wire eats it, pi never sees it.
Ollama's native /api/chat endpoint doesn't have this problem. This extension talks to /api/chat directly — routing around the shim, not patching it — so tool calls survive streaming and the agent loop completes through tool use, multi-turn workflows, and reasoning-heavy prompts.
Other Ollama extensions for pi exist (linked in Related projects below) and they're solid for chat-style use. The architectural difference here is the API path: through the shim vs. around it. If you're using local Ollama specifically for the agentic tool-call workflows pi is designed around, that distinction is the whole point of this extension.
pi install npm:pi-ollamaOr for local development:
git clone https://github.com/CaptCanadaMan/pi-ollama
cd pi-ollama
npm install
pi install /absolute/path/to/pi-ollamaRequires Ollama running locally (default http://localhost:11434) and at least one tool-capable model pulled.
pi uninstall npm:pi-ollamaThis removes the on-disk package and the entry from ~/.pi/agent/settings.json. Pi won't auto-restore it on the next launch.
The bare form pi uninstall pi-ollama doesn't work — pi parses bare names as relative local paths rather than npm packages, so the npm: prefix is required for any npm-installed extension.
If you've already manually deleted the package directory (find it with npm root -g), pi will silently reinstall it on the next launch because npm:pi-ollama is still in ~/.pi/agent/settings.json. Run the uninstall command above to clear the settings entry — the disk side is already clean.
Optional cleanup of the model discovery cache:
rm -f ~/.pi/agent/cache/pi-ollama-models.jsonAfter installation, launch pi and run:
/ollama-status
You should see something like:
Ollama base URL: http://localhost:11434
✓ Ollama reachable — 3 model(s) registered
qwen2.5-coder:7b ctx:131,072 [tools]
gemma4:26b ctx:262,144 [tools, vision, reasoning]
llama3.1:8b ctx:131,072 [tools]
Switch to one of the discovered models and use pi normally — tool calls work end-to-end.
| Command | Description |
|---|---|
/ollama-status |
Show the Ollama base URL, registered models with capability flags, and currently loaded models. |
/ollama-refresh |
Re-discover models from /api/tags + /api/show and re-register the provider. Useful after ollama pull <model>. |
/ollama-info <model-id> |
Dump the full /api/show response for a model — capabilities, context length, parameters, etc. |
| Variable | Default | Purpose |
|---|---|---|
OLLAMA_HOST |
localhost:11434 |
Ollama server host[:port]. May include or omit protocol. |
OLLAMA_NATIVE_DEBUG |
unset | Set to 1 to enable per-chunk debug logging. Writes to a file (see below) — not stderr, since stderr writes corrupt pi's TUI rendering. |
OLLAMA_NATIVE_DEBUG_LOG |
~/.pi/agent/cache/pi-ollama-debug.log |
Override the default debug log path. |
OLLAMA_NATIVE_DUMP_DIR |
unset | If set, writes paired req-*.json / res-*.ndjson files per request — exact replay artifacts for diagnostics. |
OLLAMA_NATIVE_GHOST_RETRIES |
2 |
Max retries when Ollama returns ghost-token responses (see Reliability below). |
Live-tail the debug log from another terminal:
tail -f ~/.pi/agent/cache/pi-ollama-debug.logOn extension load, the provider:
- Reads cached models from
~/.pi/agent/cache/pi-ollama-models.json(instant startup, no network). - Calls
GET /api/tagsto list pulled models. - For each model, calls
POST /api/showto extract:- Context window from
model_info.*.context_length. - Tool support from
capabilitiesarray, falling back to family-name heuristics for older Ollama versions. - Vision support from
capabilitiesordetails.familiescontainingclip. - Reasoning/thinking support from
capabilitiesor model-name patterns (r1,deepseek,gemma4, etc.).
- Context window from
- Caches the result for next startup.
If Ollama is unreachable at startup, the cached list is used as a fallback. Run /ollama-refresh once it's available to re-discover.
Ollama's streaming has a few known edge cases. The provider handles them explicitly rather than letting them surface as silent stalls:
Ghost-token retry. Ollama occasionally generates output tokens but streams nothing visible (done:true, eval_count > 0, empty message). The provider reads the first NDJSON line of each attempt, detects this pattern, cancels the connection, and retries. Up to OLLAMA_NATIVE_GHOST_RETRIES times (default 2 → ≈99% success at typical failure rates).
Truncation detection. If the connection closes before any chunk with done:true arrives, the provider surfaces a clear error rather than silently treating the partial response as complete. The error explains this is an Ollama-side reliability issue and prompts a retry.
Empty-response detection. If the connection closes without sending any chunks at all, the provider raises a distinct error pointing at the most likely causes (model failed to load, Ollama crashed, network issue).
Post-stream ghost check. Belt-and-suspenders: if eval_count > 0 but no content, thinking, or tool calls landed in the parsed stream, the provider raises an error rather than reporting a successful empty turn.
- pi: Tested against
@mariozechner/pi-coding-agentv0.71.x. Should work with any version exposing the standardExtensionAPI(registerProviderwithstreamSimple,registerCommandwithctx.ui.notify). - Ollama: Requires Ollama with
/api/chatsupport (most versions)./api/psis used opportunistically and tolerates older versions that don't expose it. - Node: Requires Node 18+ for built-in
fetchand Web Streams.
The extension registers an ollama provider with a custom streamSimple handler. Pi calls streamSimple(model, context, options) for every turn; the handler converts pi's internal message format to Ollama's /api/chat wire format, opens an NDJSON stream, parses chunks into pi's AssistantMessageEventStream events (text deltas, thinking deltas, tool-call bursts, done), and surfaces errors with explanatory messages. No core pi changes required — streamSimple fully replaces the built-in handler for the registered API string.
See src/ for the implementation. Each file has a header comment explaining its role.
- Ollama Cloud (
https://ollama.com). This extension targets local Ollama. Cloud requires different auth (OLLAMA_API_KEY) and a different base URL — seefgrehm/pi-ollama-cloudif you want cloud-only. - Per-model
temperature/top_pdefaults. Sampling parameters are passed through from pi's options when set, but there's no extension-level config for default values per model. Open an issue if you need this. - Auto-pull. If you select a model that isn't pulled, you'll get an error from Ollama. The extension doesn't offer to
ollama pullit for you.
- pi-mono — the pi coding agent itself
- ollama#12557 — the upstream tool-calling streaming bug this extension routes around
- pi-mono#3357 — the open issue requesting an official local-LLM extension
@0xkobold/pi-ollama— alternative extension covering local + cloud via the OpenAI-compat shimfgrehm/pi-ollama-cloud— cloud-only Ollama extension
MIT © 2026 CaptCanadaMan