errantsky · errantsky · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,39 @@ All notable changes to this project are documented here.
 
 ### Added
 
+**Multi-provider LLM support via req_llm**
+
+- `RLM.LLM.ReqLLM` — new default LLM backend that delegates to `req_llm` v1.6,
+  supporting Anthropic, OpenAI, Ollama (local models), Google Gemini, Groq, and any
+  other provider that `req_llm` supports. Model specs use the `"provider:model-name"`
+  convention (e.g., `"anthropic:claude-sonnet-4-6"`, `"ollama:qwen3.5:35b"`). Bare
+  model names without a provider prefix are treated as Anthropic for backward compat.
+- `RLM.LLM.Anthropic` — the previous hand-rolled Anthropic Messages API client,
+  preserved as a fallback for users who need direct Anthropic-specific control.
+  Select via `llm_module: RLM.LLM.Anthropic`.
+- `RLM.LLM` refactored to a pure behaviour module + shared utilities
+  (`extract_structured/1`, `response_schema/0`); no longer contains an implementation.
+- `models` config field — `%{atom() => String.t()}` map of symbolic keys to
+  model specs. Default: `%{large: "claude-sonnet-4-6", small: "claude-haiku-4-5"}`.
+  Bare names are auto-prefixed with `"anthropic:"` by `ReqLLM`. Pass custom maps
+  for Ollama/OpenAI: `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}`
+- `RLM.Config.resolve_model/2` — looks up a model key in the `models` map
+- `RLM.Config.context_window_for/2` — resolves context window size for a model key
+  (legacy fields for `:large`/`:small`, default 128k for custom keys)
+- `model_key` option on Workers — replaces inline `config.model_large`/`config.model_small`
+  lookups with named model map resolution
+
+### Changed
+
+- Default `llm_module` changed from `RLM.LLM` (which was the implementation) to
+  `RLM.LLM.ReqLLM` (the new multi-provider adapter)
+- API key resolution now checks `ANTHROPIC_API_KEY` first, falls back to `CLAUDE_API_KEY`
+- `RLM.Worker` uses `model_key` (`:large`, `:small`, or custom atom) to resolve model
+  specs via `Config.resolve_model/2` instead of reading `config.model_large`/`model_small`
+- `RLM.run/3`, `RLM.run_async/3`, `RLM.start_session/1`, `RLM.Replay.replay/2` pass
+  `model_key:` instead of `model:` in worker opts
+- `req_llm` (`~> 1.6`) added as a dependency
+
 **Deterministic replay**
 
 - `RLM.Replay` — replay orchestrator that re-executes a previously recorded run using
@@ -33,9 +66,32 @@ All notable changes to this project are documented here.
   then falls back to a live LLM module when the tape is exhausted
 - `:fallback` option on `RLM.replay/2` — `:error` (default) or `:live` to switch
   to live LLM calls when the tape runs out (e.g., because a patch caused extra iterations)
+- `examples/local_models.exs` — new example demonstrating Ollama/local model usage
+  with no API key required. Registered as `mix rlm.examples local_models`
+- `test/rlm/config_test.exs` — 16 new unit tests for `Config.load/1`,
+  `Config.resolve_model/2`, and `Config.context_window_for/2`
 - 17 tests covering recording, tape construction, replay LLM, replay orchestration,
   patching, fallback behavior, and the public API
 
+### Fixed
+
+- `RLM.LLM.ReqLLM.encode_object/1` now returns an explicit error instead of silently
+  falling back to an empty string when the LLM response contains no usable content
+- `RLM.LLM.ReqLLM.extract_usage/1` logs a warning when token usage extraction fails
+  (all fields nil despite non-empty response), preventing silent zero-cost reporting
+- `RLM.Replay.Tape.get_events/1` now catches `:noproc` exits specifically and logs
+  a warning for unexpected exit reasons, instead of broadly swallowing all exits
+- `RLM.Replay.FallbackLLM` now logs when switching from tape replay to live LLM calls
+- `RLM.Config.context_window_for/2` now logs a warning when using the 128k default
+  for custom model keys, making it easier to diagnose compaction behavior
+- `RLM.Replay` moduledoc corrected: fallback default is `RLM.LLM.ReqLLM` (not `RLM.LLM`)
+- `RLM.Worker` moduledoc updated to be provider-agnostic (no longer references "Claude's
+  output_config" specifically)
+- `CLAUDE.md` — removed stale `cost_per_1k_*` config fields; fixed `models` default to
+  match actual bare-name defaults; updated env var references to `ANTHROPIC_API_KEY`
+- All examples updated from `CLAUDE_API_KEY` to `ANTHROPIC_API_KEY`; smoke test checks
+  both env vars
+
 **Distributed Erlang node support**
 
 - `RLM.Node` — lightweight wrapper for OTP distribution with three public functions:

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -13,7 +13,10 @@ rlm/
 │   │   ├── run.ex                # Per-run coordinator GenServer
 │   │   ├── worker.ex             # RLM GenServer (iterate loop + keep_alive)
 │   │   ├── eval.ex               # Sandboxed Code.eval_string
-│   │   ├── llm.ex                # Anthropic Messages API client
+│   │   ├── llm.ex                # LLM behaviour + shared utilities
+│   │   ├── llm/
+│   │   │   ├── req_llm.ex        # Multi-provider backend via req_llm (default)
+│   │   │   └── anthropic.ex      # Direct Anthropic API client (legacy fallback)
 │   │   ├── helpers.ex            # chunks/2, grep/2, preview/2, list_bindings/0
 │   │   ├── sandbox.ex            # Eval sandbox: helpers + LLM calls + tool wrappers
 │   │   ├── prompt.ex             # System prompt + message formatting
@@ -83,10 +86,10 @@ mix test
 # Run tests with trace output
 mix test --trace
 
-# Run live API tests (requires CLAUDE_API_KEY env var)
+# Run live API tests (requires ANTHROPIC_API_KEY or CLAUDE_API_KEY env var)
 mix test --include live_api
 
-# Live smoke test (requires CLAUDE_API_KEY env var)
+# Live smoke test (requires ANTHROPIC_API_KEY or CLAUDE_API_KEY env var)
 mix rlm.smoke
 
 # Interactive shell
@@ -162,15 +165,27 @@ retrieve the execution trace via `RLM.EventLog`. On failure it returns `{:error,
 A `Process.monitor` on the Worker ensures crashes surface as errors rather than hangs.
 
 ### LLM Client
-Uses the Anthropic Messages API (not OpenAI format). System messages are
-extracted and sent as the top-level `system` field. Requires `CLAUDE_API_KEY` env var.
+The default backend is `RLM.LLM.ReqLLM`, which delegates to the `req_llm` package
+and supports any provider: Anthropic, OpenAI, Ollama (local), Gemini, Groq, etc.
+Model specs use the `"provider:model-name"` convention (e.g., `"anthropic:claude-sonnet-4-6"`,
+`"ollama:qwen3.5:35b"`). Bare names without a prefix are treated as Anthropic for
+backward compatibility. Requires `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY` as fallback).
 
-LLM responses use structured output (`output_config` with `json_schema`) to constrain
-responses to `{"reasoning": "...", "code": "..."}` JSON objects. This eliminates regex-based
-code extraction and provides clean separation of reasoning from executable code. Feedback
-messages after eval are also structured JSON.
+The legacy hand-rolled Anthropic client is preserved as `RLM.LLM.Anthropic` and can
+be selected via `llm_module: RLM.LLM.Anthropic`.
 
-Default models:
+LLM responses use structured output (JSON schema) to constrain responses to
+`{"reasoning": "...", "code": "..."}` objects. Feedback messages after eval are also
+structured JSON.
+
+The `models` config field maps symbolic keys to model specs:
+
+```elixir
+RLM.run(context, query,
+  models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"})
+```
+
+Default models (bare names; `ReqLLM` auto-prefixes with `"anthropic:"`):
 - Large: `claude-sonnet-4-6`
 - Small: `claude-haiku-4-5`
 
@@ -186,7 +201,9 @@ Default models:
 | `RLM.Worker` | GenServer per execution node; iterate loop + keep_alive mode; delegates spawning to Run |
 | `RLM.Eval` | Sandboxed `Code.eval_string` with async IO capture + cwd injection |
 | `RLM.Sandbox` | Functions injected into eval'd code (helpers + LLM calls + tool wrappers) |
-| `RLM.LLM` | Anthropic Messages API client with structured output (`extract_structured/1`) |
+| `RLM.LLM` | LLM behaviour + shared utilities (`extract_structured/1`, `response_schema/0`) |
+| `RLM.LLM.ReqLLM` | Multi-provider LLM backend via `req_llm` (default) |
+| `RLM.LLM.Anthropic` | Direct Anthropic Messages API client (legacy fallback) |
 | `RLM.Prompt` | System prompt loading + structured JSON feedback message formatting |
 | `RLM.Helpers` | `chunks/2`, `grep/2`, `preview/2`, `list_bindings/0` |
 | `RLM.Truncate` | Head+tail string truncation for stdout overflow |
@@ -246,9 +263,10 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 | Field | Default | Notes |
 |---|---|---|
 | `api_base_url` | `"https://api.anthropic.com"` | Anthropic API base URL |
-| `api_key` | `CLAUDE_API_KEY` env var | API key for LLM requests |
-| `model_large` | `claude-sonnet-4-6` | Used for parent workers |
-| `model_small` | `claude-haiku-4-5` | Used for subcalls |
+| `api_key` | `ANTHROPIC_API_KEY` env var | API key for LLM requests (falls back to `CLAUDE_API_KEY`) |
+| `models` | `%{large: "claude-sonnet-4-6", small: "claude-haiku-4-5"}` | Named model map; keys are atoms, values are model specs. Bare names are auto-prefixed with `"anthropic:"` by `ReqLLM` |
+| `model_large` | `claude-sonnet-4-6` | Legacy; used to build default `models` map |
+| `model_small` | `claude-haiku-4-5` | Legacy; used to build default `models` map |
 | `max_iterations` | `25` | Per-worker LLM turn limit |
 | `max_depth` | `5` | Recursive subcall depth limit |
 | `max_concurrent_subcalls` | `10` | Parallel subcall limit per worker |
@@ -259,23 +277,19 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 | `eval_timeout` | `300_000` | ms per eval (5 min) |
 | `llm_timeout` | `120_000` | ms per LLM request (2 min) |
 | `subcall_timeout` | `600_000` | ms per subcall (10 min) |
-| `cost_per_1k_prompt_tokens_large` | `0.003` | Cost tracking for large model input |
-| `cost_per_1k_prompt_tokens_small` | `0.0008` | Cost tracking for small model input |
-| `cost_per_1k_completion_tokens_large` | `0.015` | Cost tracking for large model output |
-| `cost_per_1k_completion_tokens_small` | `0.004` | Cost tracking for small model output |
 | `enable_otel` | `false` | Enable OpenTelemetry integration |
 | `enable_event_log` | `true` | Enable per-run EventLog trace agents |
 | `event_log_capture_full_stdout` | `false` | Store full stdout in traces (vs truncated) |
 | `enable_replay_recording` | `false` | Record full LLM responses for deterministic replay |
-| `llm_module` | `RLM.LLM` | Swappable for `RLM.Test.MockLLM` |
+| `llm_module` | `RLM.LLM.ReqLLM` | Default LLM backend; swap to `RLM.LLM.Anthropic` or `RLM.Test.MockLLM` |
 
 ## Testing Conventions
 
 - Tests use `RLM.Test.MockLLM` (global ETS-based response queue) for deterministic testing
 - Worker/keep_alive tests run `async: false` since MockLLM uses global state
 - Tool tests and sandbox tests can run `async: true` (no global state)
 - Live API tests tagged with `@moduletag :live_api` and excluded by default
-- `mix test --include live_api` requires `CLAUDE_API_KEY` env var
+- `mix test --include live_api` requires `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`) env var
 - Test support files in `test/support/`
 - Tool tests use a per-test temp directory (created in `setup`, cleaned in `on_exit`)
 - Worker concurrency/depth tests use `RLM.Test.Helpers.start_test_run/1` to create a Run, then spawn Workers via `RLM.Run.start_worker/2`
@@ -286,7 +300,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 - Workers use `restart: :temporary` — they terminate normally after completion
 - The `llm_module` config field enables dependency injection for testing
 - Bash tool uses `Task.async` + `Task.yield/2` (not `System.cmd` — it has no `:timeout` option)
-- `.env` file with `CLAUDE_API_KEY` should exist at project root but must not be committed
+- `.env` file with `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`) should exist at project root but must not be committed
 - `RLM.run/3` monitors the Worker with `Process.monitor` so crashes return `{:error, reason}`
   rather than hanging indefinitely
 
@@ -309,20 +323,99 @@ The dashboard is a Phoenix 1.8 LiveView application. Key conventions:
 
 ## Orientation for Coding Agents
 
+### Getting Started
+
 When starting a task, read these files in order:
 
 1. **`CLAUDE.md`** (this file) — architecture, invariants, module map
 2. **`config/config.exs`** — runtime defaults
 3. The specific module(s) relevant to your task (see Module Map above)
 4. The corresponding test file to understand expected behaviour
 
-Key invariants **never to break**:
+### Key Invariants (Never Break These)
+
 - Raw input data must not enter any LLM context window (use `preview/2` or metadata only)
 - Workers are `:temporary` — do not change their restart strategy
 - The async-eval pattern in `RLM.Worker` is intentional; do not make eval synchronous
 - All session tests must use `async: false` (MockLLM is global ETS state)
+- Run → Worker communication is always `send/2`, never `GenServer.call` (deadlock prevention)
+
+### Key Contracts & Interfaces
+
+**LLM Behaviour** (`RLM.LLM`):
+```elixir
+@callback chat(messages :: [map()], model :: String.t(), config :: RLM.Config.t(), opts :: keyword()) ::
+  {:ok, json_string :: String.t(), usage :: usage()} | {:error, String.t()}
+```
+All LLM modules (`ReqLLM`, `Anthropic`, `MockLLM`, `Replay.LLM`, `Replay.FallbackLLM`) implement
+this same callback. The `json_string` return is always a JSON-encoded string, never a parsed map.
+
+**Usage type**: `%{prompt_tokens: integer | nil, completion_tokens: integer | nil, total_tokens: integer | nil, cache_creation_input_tokens: integer | nil, cache_read_input_tokens: integer | nil}`
+
+**Model resolution**: Use `RLM.Config.resolve_model(config, :large | :small | atom())` → `{:ok, "provider:model-name"}` or `{:error, reason}`. In Worker, use `resolve_model!/2` (raises on unknown keys).
+
+**Tool Behaviour** (`RLM.Tool`):
+```elixir
+@callback name() :: String.t()
+@callback description() :: String.t()
+@callback execute(map()) :: {:ok, String.t()} | {:error, String.t()}
+```
+
+### Dependency Injection Pattern
+
+The `llm_module` config field is the primary injection point:
+- **Production**: `RLM.LLM.ReqLLM` (default) — multi-provider via `req_llm`
+- **Testing**: `RLM.Test.MockLLM` — ETS-based response queue, set in `config/test.exs`
+- **Legacy**: `RLM.LLM.Anthropic` — direct Anthropic HTTP client
+- **Replay**: `RLM.Replay.LLM` / `RLM.Replay.FallbackLLM` — tape-based, set by `RLM.Replay`
+
+When adding a new LLM feature, implement it in the behaviour callback — the Worker
+calls `config.llm_module.chat(...)` and is provider-agnostic.
+
+### Testing Patterns
+
+**MockLLM usage** — queue expected responses before running Workers:
+```elixir
+RLM.Test.MockLLM.enqueue(%{
+  "reasoning" => "I'll count the lines",
+  "code" => ~s(final_answer = 4)
+})
+```
+MockLLM is global ETS state. Tests using it must be `async: false`.
+
+**Creating a test Run** — use `RLM.Test.Helpers.start_test_run/1`:
+```elixir
+{run_pid, run_id} = RLM.Test.Helpers.start_test_run(config)
+{:ok, worker_pid, span_id} = RLM.Run.start_worker(run_pid, worker_opts)
+```
+
+**Tool tests** — use per-test temp dirs (created in `setup`, cleaned in `on_exit`);
+these can run `async: true` since tools have no global state.
+
+### Common Modification Patterns
+
+**Adding a new config field:**
+1. Add to `defstruct` in `config.ex`
+2. Add to `load/1` with `get(overrides, :key, default)`
+3. Add row to CLAUDE.md Config Fields table
+4. Add to CHANGELOG.md
+
+**Adding a new tool:**
+1. Create `lib/rlm/tools/my_tool.ex` implementing `RLM.Tool`
+2. Add to `RLM.ToolRegistry.all/0`
+3. Add wrapper function to `RLM.Sandbox`
+4. Add to system prompt in `priv/system_prompt.md`
+5. Add row to CLAUDE.md Module Map (Filesystem Tools section)
+
+**Adding a new LLM behaviour implementation:**
+1. Create module with `@behaviour RLM.LLM`
+2. Implement `chat/4` returning `{:ok, json_string, usage}` or `{:error, string}`
+3. Users select it via `llm_module:` config override
+4. Add row to CLAUDE.md Module Map
+
+### Before Committing
 
-Before committing, always run:
+Always run:
 ```bash
 mix compile --warnings-as-errors
 mix test

diff --git a/README.md b/README.md
@@ -9,8 +9,9 @@ I wanted to take further, and the design philosophy behind
 [pi](https://github.com/badlogic/pi-mono/) — a coding agent that keeps things simple and
 transparent. This is very much a learning project, but it works and it's been fun to build.
 
-A single Phoenix application: an AI execution engine where Claude writes Elixir code that
+A single Phoenix application: an AI execution engine where LLMs write Elixir code that
 runs in a persistent REPL, with recursive sub-agent spawning and built-in filesystem tools.
+Supports multiple LLM providers via `req_llm`: Anthropic, OpenAI, Ollama (local), Gemini, and more.
 
 **One engine, two modes:**
 1. **One-shot** — `RLM.run/3` processes data and returns a result
@@ -81,7 +82,7 @@ Three invariants the engine enforces:
 Requires Elixir ≥ 1.19 / OTP 27 and an [Anthropic API key](https://console.anthropic.com/).
 
 ```bash
-export CLAUDE_API_KEY=sk-ant-...
+export ANTHROPIC_API_KEY=sk-ant-...   # or CLAUDE_API_KEY as fallback
 mix deps.get && mix compile
 mix test        # excludes live API tests
 iex -S mix      # interactive shell
@@ -136,12 +137,18 @@ watch(session)     # attach a live telemetry stream
 ### Configuration overrides
 
 ```elixir
+# Use custom Anthropic models
 {:ok, result, run_id} = RLM.run(context, query,
   max_iterations: 10,
   max_depth: 3,
-  model_large: "claude-opus-4-6",
+  models: %{large: "anthropic:claude-opus-4-6", small: "anthropic:claude-haiku-4-5"},
   eval_timeout: 60_000
 )
+
+# Use local Ollama models (no API key needed)
+{:ok, result, run_id} = RLM.run(context, query,
+  models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}
+)
 ```
 
 ### Deterministic replay
@@ -356,9 +363,9 @@ RLM_COOKIE=secret   # shared secret for node authentication
 
 RLM executes LLM-generated Elixir code via `Code.eval_string` with full access to the
 host filesystem, network, and shell. **Do not expose RLM to untrusted users or untrusted
-LLM providers.** It is designed for local development, trusted API backends (Anthropic),
-and controlled environments. There is no sandboxing beyond process-level isolation and
-configurable timeouts.
+LLM providers.** It is designed for local development, trusted API backends (Anthropic,
+OpenAI, local Ollama), and controlled environments. There is no sandboxing beyond
+process-level isolation and configurable timeouts.
 
 ---
 

diff --git a/config/runtime.exs b/config/runtime.exs
@@ -35,12 +35,15 @@ if config_env() == :prod do
       You can generate one by calling: mix phx.gen.secret
       """
 
-  # CLAUDE_API_KEY is required in prod for LLM calls.
-  System.get_env("CLAUDE_API_KEY") ||
+  # An API key is required in prod for LLM calls.
+  # ANTHROPIC_API_KEY is preferred; CLAUDE_API_KEY is accepted as a fallback.
+  unless System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY") do
     raise """
-    environment variable CLAUDE_API_KEY is missing.
+    environment variable ANTHROPIC_API_KEY is missing.
     Set it to your Anthropic API key to enable LLM functionality.
+    (CLAUDE_API_KEY is also accepted as a fallback.)
     """
+  end
 
   host = System.get_env("PHX_HOST") || "example.com"
 

diff --git a/examples/code_review.exs b/examples/code_review.exs
@@ -11,7 +11,7 @@
 # - Filesystem tool usage visible in code blocks
 #
 # Usage:
-#   export CLAUDE_API_KEY=sk-ant-...
+#   export ANTHROPIC_API_KEY=sk-ant-...
 #   mix run examples/code_review.exs
 #
 # Or via the Mix task: