From 23dd0a12844c456f76be6273691049ea4896015d Mon Sep 17 00:00:00 2001
From: errantsky <code@errantsky.dev>
Date: Thu, 5 Mar 2026 21:44:56 -0800
Subject: [PATCH 1/3] feat: add multi-provider LLM support via req_llm
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace the monolithic RLM.LLM with a behaviour + two implementations:
- RLM.LLM.ReqLLM (new default) — delegates to req_llm v1.6, supports
  Anthropic, OpenAI, Ollama, Gemini, Groq via "provider:model" specs
- RLM.LLM.Anthropic — preserved legacy hand-rolled Anthropic client

Add a named model map (config.models) with Config.resolve_model/2 and
Config.context_window_for/2. Workers use model_key atoms (:large, :small,
or custom) instead of inline config.model_large/model_small lookups.

API key resolution now checks ANTHROPIC_API_KEY first, falls back to
CLAUDE_API_KEY. All 162 tests pass unchanged (MockLLM unaffected).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 CHANGELOG.md             |  33 ++++++++++
 CLAUDE.md                |  46 +++++++++----
 README.md                |  19 ++++--
 config/runtime.exs       |   9 ++-
 lib/rlm.ex               |  10 +--
 lib/rlm/config.ex        |  98 ++++++++++++++++++++++++----
 lib/rlm/iex.ex           |   2 +-
 lib/rlm/llm.ex           | 136 +++++++++------------------------------
 lib/rlm/llm/anthropic.ex | 112 ++++++++++++++++++++++++++++++++
 lib/rlm/llm/req_llm.ex   | 130 +++++++++++++++++++++++++++++++++++++
 lib/rlm/replay.ex        |   2 +-
 lib/rlm/worker.ex        |  30 +++++----
 mix.exs                  |   3 +-
 mix.lock                 |  15 +++++
 14 files changed, 480 insertions(+), 165 deletions(-)
 create mode 100644 lib/rlm/llm/anthropic.ex
 create mode 100644 lib/rlm/llm/req_llm.ex

diff --git a/CHANGELOG.md b/CHANGELOG.md
index e09713c..f3d9af5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,39 @@ All notable changes to this project are documented here.
 
 ### Added
 
+**Multi-provider LLM support via req_llm**
+
+- `RLM.LLM.ReqLLM` — new default LLM backend that delegates to `req_llm` v1.6,
+  supporting Anthropic, OpenAI, Ollama (local models), Google Gemini, Groq, and any
+  other provider that `req_llm` supports. Model specs use the `"provider:model-name"`
+  convention (e.g., `"anthropic:claude-sonnet-4-6"`, `"ollama:qwen3.5:35b"`). Bare
+  model names without a provider prefix are treated as Anthropic for backward compat.
+- `RLM.LLM.Anthropic` — the previous hand-rolled Anthropic Messages API client,
+  preserved as a fallback for users who need direct Anthropic-specific control.
+  Select via `llm_module: RLM.LLM.Anthropic`.
+- `RLM.LLM` refactored to a pure behaviour module + shared utilities
+  (`extract_structured/1`, `response_schema/0`); no longer contains an implementation.
+- `models` config field — `%{atom() => String.t()}` map of symbolic keys to
+  provider-prefixed model specs. Default: `%{large: "anthropic:claude-sonnet-4-6",
+  small: "anthropic:claude-haiku-4-5"}`. Pass custom maps for Ollama/OpenAI:
+  `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}`
+- `RLM.Config.resolve_model/2` — looks up a model key in the `models` map
+- `RLM.Config.context_window_for/2` — resolves context window size for a model key
+  (legacy fields for `:large`/`:small`, default 128k for custom keys)
+- `model_key` option on Workers — replaces inline `config.model_large`/`config.model_small`
+  lookups with named model map resolution
+
+### Changed
+
+- Default `llm_module` changed from `RLM.LLM` (which was the implementation) to
+  `RLM.LLM.ReqLLM` (the new multi-provider adapter)
+- API key resolution now checks `ANTHROPIC_API_KEY` first, falls back to `CLAUDE_API_KEY`
+- `RLM.Worker` uses `model_key` (`:large`, `:small`, or custom atom) to resolve model
+  specs via `Config.resolve_model/2` instead of reading `config.model_large`/`model_small`
+- `RLM.run/3`, `RLM.run_async/3`, `RLM.start_session/1`, `RLM.Replay.replay/2` pass
+  `model_key:` instead of `model:` in worker opts
+- `req_llm` (`~> 1.6`) added as a dependency
+
 **Deterministic replay**
 
 - `RLM.Replay` — replay orchestrator that re-executes a previously recorded run using
diff --git a/CLAUDE.md b/CLAUDE.md
index 467c03d..dba0805 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -13,7 +13,10 @@ rlm/
 │   │   ├── run.ex                # Per-run coordinator GenServer
 │   │   ├── worker.ex             # RLM GenServer (iterate loop + keep_alive)
 │   │   ├── eval.ex               # Sandboxed Code.eval_string
-│   │   ├── llm.ex                # Anthropic Messages API client
+│   │   ├── llm.ex                # LLM behaviour + shared utilities
+│   │   ├── llm/
+│   │   │   ├── req_llm.ex        # Multi-provider backend via req_llm (default)
+│   │   │   └── anthropic.ex      # Direct Anthropic API client (legacy fallback)
 │   │   ├── helpers.ex            # chunks/2, grep/2, preview/2, list_bindings/0
 │   │   ├── sandbox.ex            # Eval sandbox: helpers + LLM calls + tool wrappers
 │   │   ├── prompt.ex             # System prompt + message formatting
@@ -162,17 +165,29 @@ retrieve the execution trace via `RLM.EventLog`. On failure it returns `{:error,
 A `Process.monitor` on the Worker ensures crashes surface as errors rather than hangs.
 
 ### LLM Client
-Uses the Anthropic Messages API (not OpenAI format). System messages are
-extracted and sent as the top-level `system` field. Requires `CLAUDE_API_KEY` env var.
+The default backend is `RLM.LLM.ReqLLM`, which delegates to the `req_llm` package
+and supports any provider: Anthropic, OpenAI, Ollama (local), Gemini, Groq, etc.
+Model specs use the `"provider:model-name"` convention (e.g., `"anthropic:claude-sonnet-4-6"`,
+`"ollama:qwen3.5:35b"`). Bare names without a prefix are treated as Anthropic for
+backward compatibility. Requires `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY` as fallback).
 
-LLM responses use structured output (`output_config` with `json_schema`) to constrain
-responses to `{"reasoning": "...", "code": "..."}` JSON objects. This eliminates regex-based
-code extraction and provides clean separation of reasoning from executable code. Feedback
-messages after eval are also structured JSON.
+The legacy hand-rolled Anthropic client is preserved as `RLM.LLM.Anthropic` and can
+be selected via `llm_module: RLM.LLM.Anthropic`.
+
+LLM responses use structured output (JSON schema) to constrain responses to
+`{"reasoning": "...", "code": "..."}` objects. Feedback messages after eval are also
+structured JSON.
+
+The `models` config field maps symbolic keys to provider-prefixed specs:
+
+```elixir
+RLM.run(context, query,
+  models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"})
+```
 
 Default models:
-- Large: `claude-sonnet-4-6`
-- Small: `claude-haiku-4-5`
+- Large: `anthropic:claude-sonnet-4-6`
+- Small: `anthropic:claude-haiku-4-5`
 
 ## Module Map
 
@@ -186,7 +201,9 @@ Default models:
 | `RLM.Worker` | GenServer per execution node; iterate loop + keep_alive mode; delegates spawning to Run |
 | `RLM.Eval` | Sandboxed `Code.eval_string` with async IO capture + cwd injection |
 | `RLM.Sandbox` | Functions injected into eval'd code (helpers + LLM calls + tool wrappers) |
-| `RLM.LLM` | Anthropic Messages API client with structured output (`extract_structured/1`) |
+| `RLM.LLM` | LLM behaviour + shared utilities (`extract_structured/1`, `response_schema/0`) |
+| `RLM.LLM.ReqLLM` | Multi-provider LLM backend via `req_llm` (default) |
+| `RLM.LLM.Anthropic` | Direct Anthropic Messages API client (legacy fallback) |
 | `RLM.Prompt` | System prompt loading + structured JSON feedback message formatting |
 | `RLM.Helpers` | `chunks/2`, `grep/2`, `preview/2`, `list_bindings/0` |
 | `RLM.Truncate` | Head+tail string truncation for stdout overflow |
@@ -246,9 +263,10 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 | Field | Default | Notes |
 |---|---|---|
 | `api_base_url` | `"https://api.anthropic.com"` | Anthropic API base URL |
-| `api_key` | `CLAUDE_API_KEY` env var | API key for LLM requests |
-| `model_large` | `claude-sonnet-4-6` | Used for parent workers |
-| `model_small` | `claude-haiku-4-5` | Used for subcalls |
+| `api_key` | `ANTHROPIC_API_KEY` env var | API key for LLM requests (falls back to `CLAUDE_API_KEY`) |
+| `models` | `%{large: "anthropic:claude-sonnet-4-6", small: "anthropic:claude-haiku-4-5"}` | Named model map; keys are atoms, values are `"provider:model"` specs |
+| `model_large` | `claude-sonnet-4-6` | Legacy; used to build default `models` map |
+| `model_small` | `claude-haiku-4-5` | Legacy; used to build default `models` map |
 | `max_iterations` | `25` | Per-worker LLM turn limit |
 | `max_depth` | `5` | Recursive subcall depth limit |
 | `max_concurrent_subcalls` | `10` | Parallel subcall limit per worker |
@@ -267,7 +285,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 | `enable_event_log` | `true` | Enable per-run EventLog trace agents |
 | `event_log_capture_full_stdout` | `false` | Store full stdout in traces (vs truncated) |
 | `enable_replay_recording` | `false` | Record full LLM responses for deterministic replay |
-| `llm_module` | `RLM.LLM` | Swappable for `RLM.Test.MockLLM` |
+| `llm_module` | `RLM.LLM.ReqLLM` | Default LLM backend; swap to `RLM.LLM.Anthropic` or `RLM.Test.MockLLM` |
 
 ## Testing Conventions
 
diff --git a/README.md b/README.md
index 8a98374..cc3e274 100644
--- a/README.md
+++ b/README.md
@@ -9,8 +9,9 @@ I wanted to take further, and the design philosophy behind
 [pi](https://github.com/badlogic/pi-mono/) — a coding agent that keeps things simple and
 transparent. This is very much a learning project, but it works and it's been fun to build.
 
-A single Phoenix application: an AI execution engine where Claude writes Elixir code that
+A single Phoenix application: an AI execution engine where LLMs write Elixir code that
 runs in a persistent REPL, with recursive sub-agent spawning and built-in filesystem tools.
+Supports multiple LLM providers via `req_llm`: Anthropic, OpenAI, Ollama (local), Gemini, and more.
 
 **One engine, two modes:**
 1. **One-shot** — `RLM.run/3` processes data and returns a result
@@ -81,7 +82,7 @@ Three invariants the engine enforces:
 Requires Elixir ≥ 1.19 / OTP 27 and an [Anthropic API key](https://console.anthropic.com/).
 
 ```bash
-export CLAUDE_API_KEY=sk-ant-...
+export ANTHROPIC_API_KEY=sk-ant-...   # or CLAUDE_API_KEY as fallback
 mix deps.get && mix compile
 mix test        # excludes live API tests
 iex -S mix      # interactive shell
@@ -136,12 +137,18 @@ watch(session)     # attach a live telemetry stream
 ### Configuration overrides
 
 ```elixir
+# Use custom Anthropic models
 {:ok, result, run_id} = RLM.run(context, query,
   max_iterations: 10,
   max_depth: 3,
-  model_large: "claude-opus-4-6",
+  models: %{large: "anthropic:claude-opus-4-6", small: "anthropic:claude-haiku-4-5"},
   eval_timeout: 60_000
 )
+
+# Use local Ollama models (no API key needed)
+{:ok, result, run_id} = RLM.run(context, query,
+  models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}
+)
 ```
 
 ### Deterministic replay
@@ -356,9 +363,9 @@ RLM_COOKIE=secret   # shared secret for node authentication
 
 RLM executes LLM-generated Elixir code via `Code.eval_string` with full access to the
 host filesystem, network, and shell. **Do not expose RLM to untrusted users or untrusted
-LLM providers.** It is designed for local development, trusted API backends (Anthropic),
-and controlled environments. There is no sandboxing beyond process-level isolation and
-configurable timeouts.
+LLM providers.** It is designed for local development, trusted API backends (Anthropic,
+OpenAI, local Ollama), and controlled environments. There is no sandboxing beyond
+process-level isolation and configurable timeouts.
 
 ---
 
diff --git a/config/runtime.exs b/config/runtime.exs
index 76d8b37..90a44b2 100644
--- a/config/runtime.exs
+++ b/config/runtime.exs
@@ -35,12 +35,15 @@ if config_env() == :prod do
       You can generate one by calling: mix phx.gen.secret
       """
 
-  # CLAUDE_API_KEY is required in prod for LLM calls.
-  System.get_env("CLAUDE_API_KEY") ||
+  # An API key is required in prod for LLM calls.
+  # ANTHROPIC_API_KEY is preferred; CLAUDE_API_KEY is accepted as a fallback.
+  unless System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY") do
     raise """
-    environment variable CLAUDE_API_KEY is missing.
+    environment variable ANTHROPIC_API_KEY is missing.
     Set it to your Anthropic API key to enable LLM functionality.
+    (CLAUDE_API_KEY is also accepted as a fallback.)
     """
+  end
 
   host = System.get_env("PHX_HOST") || "example.com"
 
diff --git a/lib/rlm.ex b/lib/rlm.ex
index a061618..cd6cbb8 100644
--- a/lib/rlm.ex
+++ b/lib/rlm.ex
@@ -59,7 +59,7 @@ defmodule RLM do
           query: query,
           config: config,
           depth: 0,
-          model: config.model_large,
+          model_key: :large,
           caller: self()
         ]
 
@@ -116,7 +116,7 @@ defmodule RLM do
           query: query,
           config: config,
           depth: 0,
-          model: config.model_large,
+          model_key: :large,
           caller: self()
         ]
 
@@ -146,7 +146,7 @@ defmodule RLM do
 
   Options:
     - `:cwd` — working directory for tools (default: current dir)
-    - `:model` — override the model (default: config.model_large)
+    - `:model_key` — model key from config.models map (default: `:large`)
     - Plus any `RLM.Config` overrides
 
   Returns `{:ok, session_id}`.
@@ -157,7 +157,7 @@ defmodule RLM do
     session_id = RLM.Span.generate_id()
     run_id = RLM.Span.generate_run_id()
     cwd = Keyword.get(opts, :cwd, File.cwd!())
-    model = Keyword.get(opts, :model, config.model_large)
+    model_key = Keyword.get(opts, :model_key, :large)
 
     run_opts = [run_id: run_id, config: config, keep_alive: true]
 
@@ -169,7 +169,7 @@ defmodule RLM do
           config: config,
           keep_alive: true,
           cwd: cwd,
-          model: model
+          model_key: model_key
         ]
 
         case RLM.Run.start_worker(run_pid, worker_opts) do
diff --git a/lib/rlm/config.ex b/lib/rlm/config.ex
index e627717..fe268ff 100644
--- a/lib/rlm/config.ex
+++ b/lib/rlm/config.ex
@@ -2,13 +2,42 @@ defmodule RLM.Config do
   @moduledoc """
   Configuration struct for RLM engine.
   Loads defaults from application env, allows runtime overrides.
+
+  ## Multi-Provider Model Map
+
+  The `models` field maps symbolic keys to provider-prefixed model specs:
+
+      %RLM.Config{
+        models: %{
+          large: "anthropic:claude-sonnet-4-6",
+          small: "anthropic:claude-haiku-4-5"
+        }
+      }
+
+  Model specs follow the `req_llm` naming convention: `"provider:model-name"`.
+  For backward compatibility, bare model names without a provider prefix
+  are treated as Anthropic models.
+
+  ## Supported Providers
+
+  Any provider supported by `req_llm`: Anthropic, OpenAI, Ollama (via vLLM),
+  Google Gemini, Groq, and more. For local Ollama:
+
+      RLM.run("data", "query",
+        models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"})
   """
 
+  require Logger
+
+  @default_context_window 128_000
+
   defstruct [
     :api_base_url,
     :api_key,
+    # Legacy model fields — prefer `models` map
     :model_large,
     :model_small,
+    :models,
     :max_iterations,
     :max_depth,
     :max_concurrent_subcalls,
@@ -19,10 +48,6 @@ defmodule RLM.Config do
     :eval_timeout,
     :llm_timeout,
     :subcall_timeout,
-    :cost_per_1k_prompt_tokens_large,
-    :cost_per_1k_prompt_tokens_small,
-    :cost_per_1k_completion_tokens_large,
-    :cost_per_1k_completion_tokens_small,
     :enable_otel,
     :enable_event_log,
     :event_log_capture_full_stdout,
@@ -34,11 +59,20 @@ defmodule RLM.Config do
 
   @spec load(keyword()) :: t()
   def load(overrides \\ []) do
+    model_large = get(overrides, :model_large, "claude-sonnet-4-6")
+    model_small = get(overrides, :model_small, "claude-haiku-4-5")
+
+    default_models = %{
+      large: model_large,
+      small: model_small
+    }
+
     %__MODULE__{
       api_base_url: get(overrides, :api_base_url, "https://api.anthropic.com"),
-      api_key: get(overrides, :api_key, System.get_env("CLAUDE_API_KEY")),
-      model_large: get(overrides, :model_large, "claude-sonnet-4-6"),
-      model_small: get(overrides, :model_small, "claude-haiku-4-5"),
+      api_key: get(overrides, :api_key, resolve_api_key()),
+      model_large: model_large,
+      model_small: model_small,
+      models: get(overrides, :models, default_models),
       max_iterations: get(overrides, :max_iterations, 25),
       max_depth: get(overrides, :max_depth, 5),
       max_concurrent_subcalls: get(overrides, :max_concurrent_subcalls, 10),
@@ -49,20 +83,56 @@ defmodule RLM.Config do
       eval_timeout: get(overrides, :eval_timeout, 300_000),
       llm_timeout: get(overrides, :llm_timeout, 120_000),
       subcall_timeout: get(overrides, :subcall_timeout, 600_000),
-      cost_per_1k_prompt_tokens_large: get(overrides, :cost_per_1k_prompt_tokens_large, 0.003),
-      cost_per_1k_prompt_tokens_small: get(overrides, :cost_per_1k_prompt_tokens_small, 0.0008),
-      cost_per_1k_completion_tokens_large:
-        get(overrides, :cost_per_1k_completion_tokens_large, 0.015),
-      cost_per_1k_completion_tokens_small:
-        get(overrides, :cost_per_1k_completion_tokens_small, 0.004),
       enable_otel: get(overrides, :enable_otel, false),
       enable_event_log: get(overrides, :enable_event_log, true),
       event_log_capture_full_stdout: get(overrides, :event_log_capture_full_stdout, false),
       enable_replay_recording: get(overrides, :enable_replay_recording, false),
-      llm_module: get(overrides, :llm_module, RLM.LLM)
+      llm_module: get(overrides, :llm_module, RLM.LLM.ReqLLM)
     }
   end
 
+  @doc """
+  Resolve a model key to its spec string.
+
+  Returns `{:ok, spec}` or `{:error, reason}`.
+
+  ## Examples
+
+      iex> config = RLM.Config.load(models: %{large: "anthropic:claude-sonnet-4-6"})
+      iex> RLM.Config.resolve_model(config, :large)
+      {:ok, "anthropic:claude-sonnet-4-6"}
+
+      iex> config = RLM.Config.load()
+      iex> RLM.Config.resolve_model(config, :unknown)
+      {:error, "Unknown model key: unknown"}
+  """
+  @spec resolve_model(t(), atom()) :: {:ok, String.t()} | {:error, String.t()}
+  def resolve_model(%__MODULE__{models: models}, key) when is_atom(key) do
+    case Map.fetch(models, key) do
+      {:ok, spec} when is_binary(spec) -> {:ok, spec}
+      :error -> {:error, "Unknown model key: #{key}"}
+    end
+  end
+
+  @doc """
+  Look up the context window size for a model key.
+
+  Uses a two-tier strategy:
+  1. Legacy `context_window_tokens_large/small` fields (for `:large`/`:small` keys)
+  2. Default of #{@default_context_window} tokens for unknown models
+  """
+  @spec context_window_for(t(), atom()) :: non_neg_integer()
+  def context_window_for(%__MODULE__{} = config, :large), do: config.context_window_tokens_large
+  def context_window_for(%__MODULE__{} = config, :small), do: config.context_window_tokens_small
+  def context_window_for(%__MODULE__{}, _key), do: @default_context_window
+
+  defp resolve_api_key do
+    case System.get_env("ANTHROPIC_API_KEY") do
+      nil -> System.get_env("CLAUDE_API_KEY")
+      key -> key
+    end
+  end
+
   defp get(overrides, key, default) do
     case Keyword.fetch(overrides, key) do
       {:ok, value} -> value
diff --git a/lib/rlm/iex.ex b/lib/rlm/iex.ex
index cf87456..b36161d 100644
--- a/lib/rlm/iex.ex
+++ b/lib/rlm/iex.ex
@@ -26,7 +26,7 @@ defmodule RLM.IEx do
   Start a new interactive session. Returns the `session_id`.
 
   Options:
-    - `:model` — override the model (default: config.model_large)
+    - `:model_key` — model key from config.models map (default: `:large`)
     - `:cwd`   — working directory for tools (default: current dir)
   """
   @spec start(keyword()) :: String.t()
diff --git a/lib/rlm/llm.ex b/lib/rlm/llm.ex
index 9dd69da..0b6e920 100644
--- a/lib/rlm/llm.ex
+++ b/lib/rlm/llm.ex
@@ -1,11 +1,18 @@
 defmodule RLM.LLM do
   @moduledoc """
-  Claude (Anthropic) Messages API client.
-  Returns response content alongside token usage metadata.
+  Behaviour for LLM backends and shared utilities for structured output parsing.
 
-  Uses structured output (`output_config` with JSON schema) to constrain
-  LLM responses to a `{"reasoning", "code"}` JSON object, eliminating
-  regex-based code extraction.
+  The default implementation is `RLM.LLM.ReqLLM` which supports multiple
+  providers via the `req_llm` package. The legacy hand-rolled Anthropic
+  client is available as `RLM.LLM.Anthropic`.
+
+  ## Implementations
+
+  - `RLM.LLM.ReqLLM` — multi-provider backend (default)
+  - `RLM.LLM.Anthropic` — direct Anthropic Messages API client
+  - `RLM.Test.MockLLM` — deterministic test mock (ETS-based)
+  - `RLM.Replay.LLM` — replay from recorded tape
+  - `RLM.Replay.FallbackLLM` — replay with live fallback
   """
 
   @type usage :: %{
@@ -16,6 +23,24 @@ defmodule RLM.LLM do
           cache_read_input_tokens: non_neg_integer() | nil
         }
 
+  @doc """
+  Send a chat request to the LLM.
+
+  ## Arguments
+
+    * `messages` — list of message maps with `:role` and `:content` fields
+    * `model` — provider-prefixed model spec (e.g., `"anthropic:claude-sonnet-4-6"`)
+    * `config` — `RLM.Config.t()` struct
+    * `opts` — keyword list; supports `:schema` for structured output
+
+  ## Returns
+
+    * `{:ok, json_string, usage}` on success
+    * `{:error, reason}` on failure
+  """
+  @callback chat([map()], String.t(), RLM.Config.t(), keyword()) ::
+              {:ok, String.t(), usage()} | {:error, String.t()}
+
   @response_schema %{
     "type" => "object",
     "properties" => %{
@@ -30,68 +55,6 @@ defmodule RLM.LLM do
   @spec response_schema() :: map()
   def response_schema, do: @response_schema
 
-  @callback chat([map()], String.t(), RLM.Config.t(), keyword()) ::
-              {:ok, String.t(), usage()} | {:error, String.t()}
-
-  @spec chat([map()], String.t(), RLM.Config.t(), keyword()) ::
-          {:ok, String.t(), usage()} | {:error, String.t()}
-  def chat(messages, model, config, opts \\ []) do
-    url = String.trim_trailing(config.api_base_url, "/") <> "/v1/messages"
-
-    {system_text, user_messages} = extract_system(messages)
-
-    headers = [
-      {"x-api-key", config.api_key || ""},
-      {"anthropic-version", "2023-06-01"},
-      {"content-type", "application/json"}
-    ]
-
-    schema = Keyword.get(opts, :schema, @response_schema)
-
-    body = %{
-      model: model,
-      max_tokens: 4096,
-      cache_control: %{type: "ephemeral"},
-      messages: format_messages(user_messages),
-      output_config: %{
-        format: %{
-          type: "json_schema",
-          schema: schema
-        }
-      }
-    }
-
-    body = if system_text, do: Map.put(body, :system, system_text), else: body
-
-    case Req.post(url,
-           json: body,
-           headers: headers,
-           receive_timeout: config.llm_timeout
-         ) do
-      {:ok, %{status: 200, body: resp_body}} ->
-        content = extract_content(resp_body)
-        usage = extract_usage(resp_body)
-
-        if content do
-          {:ok, content, usage}
-        else
-          {:error, "No content in API response"}
-        end
-
-      {:ok, %{status: status, body: resp_body}} ->
-        error_msg =
-          case resp_body do
-            %{"error" => %{"message" => msg}} -> msg
-            _ -> "HTTP #{status}"
-          end
-
-        {:error, "API error: #{error_msg}"}
-
-      {:error, reason} ->
-        {:error, "API request failed: #{inspect(reason)}"}
-    end
-  end
-
   @doc """
   Parse a structured JSON response from the LLM.
 
@@ -113,43 +76,4 @@ defmodule RLM.LLM do
         {:error, "JSON parse failed: #{inspect(err)}"}
     end
   end
-
-  defp extract_system(messages) do
-    case Enum.split_with(messages, fn m -> m.role == :system end) do
-      {[], rest} -> {nil, rest}
-      {system_msgs, rest} -> {Enum.map_join(system_msgs, "\n", & &1.content), rest}
-    end
-  end
-
-  defp format_messages(messages) do
-    Enum.map(messages, fn msg ->
-      %{"role" => to_string(msg.role), "content" => msg.content}
-    end)
-  end
-
-  defp extract_content(body) do
-    body
-    |> Map.get("content", [])
-    |> Enum.find_value(fn
-      %{"type" => "text", "text" => text} -> text
-      _ -> nil
-    end)
-  end
-
-  defp extract_usage(body) do
-    usage = Map.get(body, "usage", %{})
-
-    input = Map.get(usage, "input_tokens")
-    output = Map.get(usage, "output_tokens")
-    cache_creation = Map.get(usage, "cache_creation_input_tokens")
-    cache_read = Map.get(usage, "cache_read_input_tokens")
-
-    %{
-      prompt_tokens: input,
-      completion_tokens: output,
-      total_tokens: if(input && output, do: input + output, else: nil),
-      cache_creation_input_tokens: cache_creation,
-      cache_read_input_tokens: cache_read
-    }
-  end
 end
diff --git a/lib/rlm/llm/anthropic.ex b/lib/rlm/llm/anthropic.ex
new file mode 100644
index 0000000..2fe0587
--- /dev/null
+++ b/lib/rlm/llm/anthropic.ex
@@ -0,0 +1,112 @@
+defmodule RLM.LLM.Anthropic do
+  @moduledoc """
+  Hand-rolled Anthropic Messages API client.
+
+  Preserved as a fallback for users who need direct control over
+  Anthropic-specific features (prompt caching, etc.). The default
+  backend is `RLM.LLM.ReqLLM`.
+
+  Select this via config:
+
+      RLM.Config.load(llm_module: RLM.LLM.Anthropic)
+  """
+
+  @behaviour RLM.LLM
+
+  @impl true
+  def chat(messages, model, config, opts \\ []) do
+    url = String.trim_trailing(config.api_base_url, "/") <> "/v1/messages"
+
+    {system_text, user_messages} = extract_system(messages)
+
+    headers = [
+      {"x-api-key", config.api_key || ""},
+      {"anthropic-version", "2023-06-01"},
+      {"content-type", "application/json"}
+    ]
+
+    schema = Keyword.get(opts, :schema, RLM.LLM.response_schema())
+
+    body = %{
+      model: model,
+      max_tokens: 4096,
+      cache_control: %{type: "ephemeral"},
+      messages: format_messages(user_messages),
+      output_config: %{
+        format: %{
+          type: "json_schema",
+          schema: schema
+        }
+      }
+    }
+
+    body = if system_text, do: Map.put(body, :system, system_text), else: body
+
+    case Req.post(url,
+           json: body,
+           headers: headers,
+           receive_timeout: config.llm_timeout
+         ) do
+      {:ok, %{status: 200, body: resp_body}} ->
+        content = extract_content(resp_body)
+        usage = extract_usage(resp_body)
+
+        if content do
+          {:ok, content, usage}
+        else
+          {:error, "No content in API response"}
+        end
+
+      {:ok, %{status: status, body: resp_body}} ->
+        error_msg =
+          case resp_body do
+            %{"error" => %{"message" => msg}} -> msg
+            _ -> "HTTP #{status}"
+          end
+
+        {:error, "API error: #{error_msg}"}
+
+      {:error, reason} ->
+        {:error, "API request failed: #{inspect(reason)}"}
+    end
+  end
+
+  defp extract_system(messages) do
+    case Enum.split_with(messages, fn m -> m.role == :system end) do
+      {[], rest} -> {nil, rest}
+      {system_msgs, rest} -> {Enum.map_join(system_msgs, "\n", & &1.content), rest}
+    end
+  end
+
+  defp format_messages(messages) do
+    Enum.map(messages, fn msg ->
+      %{"role" => to_string(msg.role), "content" => msg.content}
+    end)
+  end
+
+  defp extract_content(body) do
+    body
+    |> Map.get("content", [])
+    |> Enum.find_value(fn
+      %{"type" => "text", "text" => text} -> text
+      _ -> nil
+    end)
+  end
+
+  defp extract_usage(body) do
+    usage = Map.get(body, "usage", %{})
+
+    input = Map.get(usage, "input_tokens")
+    output = Map.get(usage, "output_tokens")
+    cache_creation = Map.get(usage, "cache_creation_input_tokens")
+    cache_read = Map.get(usage, "cache_read_input_tokens")
+
+    %{
+      prompt_tokens: input,
+      completion_tokens: output,
+      total_tokens: if(input && output, do: input + output, else: nil),
+      cache_creation_input_tokens: cache_creation,
+      cache_read_input_tokens: cache_read
+    }
+  end
+end
diff --git a/lib/rlm/llm/req_llm.ex b/lib/rlm/llm/req_llm.ex
new file mode 100644
index 0000000..a5866e0
--- /dev/null
+++ b/lib/rlm/llm/req_llm.ex
@@ -0,0 +1,130 @@
+defmodule RLM.LLM.ReqLLM do
+  @moduledoc """
+  Multi-provider LLM backend using the `req_llm` package.
+
+  Supports any provider that `req_llm` supports: Anthropic, OpenAI,
+  Ollama (via vLLM), Google Gemini, Groq, and more. Model specs follow
+  the `"provider:model-name"` convention.
+
+  For backward compatibility, bare model names without a provider prefix
+  are treated as Anthropic models (e.g., `"claude-sonnet-4-6"` becomes
+  `"anthropic:claude-sonnet-4-6"`).
+
+  ## Provider-Specific Features
+
+  - **Anthropic**: Prompt caching enabled automatically via `anthropic_prompt_cache: true`
+  - **Ollama**: No API key needed; uses `http://localhost:11434` by default
+  - **OpenAI**: Reads `OPENAI_API_KEY` from env
+  """
+
+  @behaviour RLM.LLM
+
+  @impl true
+  def chat(messages, model, config, opts \\ []) do
+    model_spec = normalize_model_spec(model)
+    schema = Keyword.get(opts, :schema, RLM.LLM.response_schema())
+    context = build_context(messages)
+    req_opts = build_opts(model_spec, config)
+
+    case ReqLLM.generate_object(model_spec, context, schema, req_opts) do
+      {:ok, response} ->
+        content = encode_object(response)
+        usage = extract_usage(response)
+        {:ok, content, usage}
+
+      {:error, reason} ->
+        {:error, format_error(reason)}
+    end
+  end
+
+  # If model string already contains ":", it's in provider:model format.
+  # Otherwise, assume Anthropic for backward compatibility.
+  defp normalize_model_spec(model) when is_binary(model) do
+    if String.contains?(model, ":") do
+      model
+    else
+      "anthropic:#{model}"
+    end
+  end
+
+  defp build_context(messages) do
+    {system_msgs, user_msgs} =
+      Enum.split_with(messages, fn m -> m.role == :system end)
+
+    system_text =
+      case system_msgs do
+        [] -> nil
+        msgs -> Enum.map_join(msgs, "\n", & &1.content)
+      end
+
+    req_messages =
+      Enum.map(user_msgs, fn msg ->
+        case msg.role do
+          :user -> ReqLLM.Context.user(msg.content)
+          :assistant -> ReqLLM.Context.assistant(msg.content)
+        end
+      end)
+
+    all_messages =
+      if system_text do
+        [ReqLLM.Context.system(system_text) | req_messages]
+      else
+        req_messages
+      end
+
+    ReqLLM.Context.new(all_messages)
+  end
+
+  defp build_opts(model_spec, config) do
+    base_opts = [
+      max_tokens: 4096,
+      receive_timeout: config.llm_timeout
+    ]
+
+    base_opts
+    |> maybe_add_api_key(config)
+    |> maybe_add_anthropic_opts(model_spec)
+  end
+
+  defp maybe_add_api_key(opts, config) do
+    if config.api_key do
+      Keyword.put(opts, :api_key, config.api_key)
+    else
+      opts
+    end
+  end
+
+  defp maybe_add_anthropic_opts(opts, model_spec) do
+    if anthropic_model?(model_spec) do
+      Keyword.put(opts, :anthropic_prompt_cache, true)
+    else
+      opts
+    end
+  end
+
+  defp anthropic_model?(spec), do: String.starts_with?(spec, "anthropic:")
+
+  # Re-serialize the parsed object back to a JSON string to preserve
+  # the existing chat/4 contract (Worker expects a JSON string).
+  defp encode_object(response) do
+    case ReqLLM.Response.object(response) do
+      obj when is_map(obj) -> Jason.encode!(obj)
+      _ -> ReqLLM.Response.text(response) || ""
+    end
+  end
+
+  defp extract_usage(response) do
+    raw = ReqLLM.Response.usage(response) || %{}
+
+    %{
+      prompt_tokens: Map.get(raw, :input_tokens),
+      completion_tokens: Map.get(raw, :output_tokens),
+      total_tokens: Map.get(raw, :total_tokens),
+      cache_creation_input_tokens: Map.get(raw, :cache_creation_input_tokens),
+      cache_read_input_tokens: Map.get(raw, :cache_read_input_tokens)
+    }
+  end
+
+  defp format_error(%{__exception__: true} = error), do: Exception.message(error)
+  defp format_error(reason), do: "LLM request failed: #{inspect(reason)}"
+end
diff --git a/lib/rlm/replay.ex b/lib/rlm/replay.ex
index 97d8fdf..053453e 100644
--- a/lib/rlm/replay.ex
+++ b/lib/rlm/replay.ex
@@ -63,7 +63,7 @@ defmodule RLM.Replay do
             query: query,
             config: config,
             depth: 0,
-            model: config.model_large,
+            model_key: :large,
             caller: self(),
             replay_tape: tape,
             replay_patches: patches,
diff --git a/lib/rlm/worker.ex b/lib/rlm/worker.ex
index 74c7ed2..1f1610d 100644
--- a/lib/rlm/worker.ex
+++ b/lib/rlm/worker.ex
@@ -41,6 +41,7 @@ defmodule RLM.Worker do
     # Tracks in-flight eval context (includes task_ref for supervised eval)
     :eval_context,
     # keep_alive mode fields
+    :model_key,
     :keep_alive,
     :cwd,
     :pending_from,
@@ -77,7 +78,8 @@ defmodule RLM.Worker do
     context = Keyword.get(opts, :context, "")
     query = Keyword.get(opts, :query, context)
     depth = Keyword.get(opts, :depth, 0)
-    model = Keyword.get(opts, :model, config.model_large)
+    model_key = Keyword.get(opts, :model_key, :large)
+    model = Keyword.get_lazy(opts, :model, fn -> resolve_model!(config, model_key) end)
     parent_span_id = Keyword.get(opts, :parent_span_id)
     caller = Keyword.get(opts, :caller)
     keep_alive = Keyword.get(opts, :keep_alive, false)
@@ -110,6 +112,7 @@ defmodule RLM.Worker do
         history: [system_msg],
         bindings: [final_answer: nil, compacted_history: ""],
         model: model,
+        model_key: model_key,
         config: config,
         status: :idle,
         result: nil,
@@ -156,6 +159,7 @@ defmodule RLM.Worker do
         history: [system_msg, user_msg],
         bindings: bindings,
         model: model,
+        model_key: model_key,
         config: config,
         status: :running,
         result: nil,
@@ -448,10 +452,7 @@ defmodule RLM.Worker do
   end
 
   def handle_call({:direct_query, text, model_size, schema}, from, state) do
-    model =
-      if model_size == :large,
-        do: state.config.model_large,
-        else: state.config.model_small
+    model = resolve_model!(state.config, model_size)
 
     if map_size(state.pending_subcalls) >= state.config.max_concurrent_subcalls do
       {:reply,
@@ -497,10 +498,7 @@ defmodule RLM.Worker do
   end
 
   def handle_call({:spawn_subcall, text, model_size}, from, state) do
-    model =
-      if model_size == :large,
-        do: state.config.model_large,
-        else: state.config.model_small
+    model = resolve_model!(state.config, model_size)
 
     cond do
       state.depth >= state.config.max_depth ->
@@ -526,6 +524,7 @@ defmodule RLM.Worker do
           context: text,
           query: text,
           model: model,
+          model_key: model_size,
           config: state.config,
           depth: state.depth + 1,
           parent_span_id: state.span_id,
@@ -947,11 +946,7 @@ defmodule RLM.Worker do
   end
 
   defp context_window_for_model(state) do
-    if state.model == state.config.model_large do
-      state.config.context_window_tokens_large
-    else
-      state.config.context_window_tokens_small
-    end
+    RLM.Config.context_window_for(state.config, state.model_key)
   end
 
   defp serialize_history(messages) do
@@ -963,6 +958,13 @@ defmodule RLM.Worker do
   defp join_compacted("", new), do: new
   defp join_compacted(existing, new), do: existing <> "\n===\n" <> new
 
+  defp resolve_model!(config, key) do
+    case RLM.Config.resolve_model(config, key) do
+      {:ok, spec} -> spec
+      {:error, _} -> Map.get(config.models, :large, "claude-sonnet-4-6")
+    end
+  end
+
   defp emit_telemetry(event, measurements, state, extra_metadata) do
     base = %{
       span_id: state.span_id,
diff --git a/mix.exs b/mix.exs
index dbb0703..e98a236 100644
--- a/mix.exs
+++ b/mix.exs
@@ -37,8 +37,9 @@ defmodule RLM.MixProject do
 
   defp deps do
     [
-      # HTTP client
+      # HTTP / LLM client
       {:req, "~> 0.5"},
+      {:req_llm, "~> 1.6"},
       {:jason, "~> 1.4"},
 
       # Telemetry
diff --git a/mix.lock b/mix.lock
index 8628def..cec4ee8 100644
--- a/mix.lock
+++ b/mix.lock
@@ -1,14 +1,18 @@
 %{
+  "abnf_parsec": {:hex, :abnf_parsec, "2.1.0", "c4e88d5d089f1698297c0daced12be1fb404e6e577ecf261313ebba5477941f9", [:mix], [{:nimble_parsec, "~> 1.4", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "e0ed6290c7cc7e5020c006d1003520390c9bdd20f7c3f776bd49bfe3c5cd362a"},
   "bandit": {:hex, :bandit, "1.10.2", "d15ea32eb853b5b42b965b24221eb045462b2ba9aff9a0bda71157c06338cbff", [:mix], [{:hpax, "~> 1.0", [hex: :hpax, repo: "hexpm", optional: false]}, {:plug, "~> 1.18", [hex: :plug, repo: "hexpm", optional: false]}, {:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}, {:thousand_island, "~> 1.0", [hex: :thousand_island, repo: "hexpm", optional: false]}, {:websock, "~> 0.5", [hex: :websock, repo: "hexpm", optional: false]}], "hexpm", "27b2a61b647914b1726c2ced3601473be5f7aa6bb468564a688646a689b3ee45"},
   "boundary": {:hex, :boundary, "0.10.4", "5fec5d2736c12f9bfe1720c3a2bd8c48c3547c24d6002ebf8e087570afd5bd2f", [:mix], [], "hexpm", "8baf6f23987afdb1483033ed0bde75c9c703613c22ed58d5f23bf948f203247c"},
   "bunt": {:hex, :bunt, "1.0.0", "081c2c665f086849e6d57900292b3a161727ab40431219529f13c4ddcf3e7a44", [:mix], [], "hexpm", "dc5f86aa08a5f6fa6b8096f0735c4e76d54ae5c9fa2c143e5a1fc7c1cd9bb6b5"},
   "cc_precompiler": {:hex, :cc_precompiler, "0.1.11", "8c844d0b9fb98a3edea067f94f616b3f6b29b959b6b3bf25fee94ffe34364768", [:mix], [{:elixir_make, "~> 0.7", [hex: :elixir_make, repo: "hexpm", optional: false]}], "hexpm", "3427232caf0835f94680e5bcf082408a70b48ad68a5f5c0b02a3bea9f3a075b9"},
   "circular_buffer": {:hex, :circular_buffer, "1.0.0", "25c004da0cba7bd8bc1bdabded4f9a902d095e20600fd15faf1f2ffbaea18a07", [:mix], [], "hexpm", "c829ec31c13c7bafd1f546677263dff5bfb006e929f25635878ac3cfba8749e5"},
   "credo": {:hex, :credo, "1.7.16", "a9f1389d13d19c631cb123c77a813dbf16449a2aebf602f590defa08953309d4", [:mix], [{:bunt, "~> 0.2.1 or ~> 1.0", [hex: :bunt, repo: "hexpm", optional: false]}, {:file_system, "~> 0.2 or ~> 1.0", [hex: :file_system, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "d0562af33756b21f248f066a9119e3890722031b6d199f22e3cf95550e4f1579"},
+  "deep_merge": {:hex, :deep_merge, "1.0.0", "b4aa1a0d1acac393bdf38b2291af38cb1d4a52806cf7a4906f718e1feb5ee961", [:mix], [], "hexpm", "ce708e5f094b9cd4e8f2be4f00d2f4250c4095be93f8cd6d018c753894885430"},
   "dns_cluster": {:hex, :dns_cluster, "0.2.0", "aa8eb46e3bd0326bd67b84790c561733b25c5ba2fe3c7e36f28e88f384ebcb33", [:mix], [], "hexpm", "ba6f1893411c69c01b9e8e8f772062535a4cf70f3f35bcc964a324078d8c8240"},
+  "dotenvy": {:hex, :dotenvy, "1.1.1", "00e318f3c51de9fafc4b48598447e386f19204dc18ca69886905bb8f8b08b667", [:mix], [], "hexpm", "c8269471b5701e9e56dc86509c1199ded2b33dce088c3471afcfef7839766d8e"},
   "earmark_parser": {:hex, :earmark_parser, "1.4.44", "f20830dd6b5c77afe2b063777ddbbff09f9759396500cdbe7523efd58d7a339c", [:mix], [], "hexpm", "4778ac752b4701a5599215f7030989c989ffdc4f6df457c5f36938cc2d2a2750"},
   "elixir_make": {:hex, :elixir_make, "0.9.0", "6484b3cd8c0cee58f09f05ecaf1a140a8c97670671a6a0e7ab4dc326c3109726", [:mix], [], "hexpm", "db23d4fd8b757462ad02f8aa73431a426fe6671c80b200d9710caf3d1dd0ffdb"},
   "esbuild": {:hex, :esbuild, "0.10.0", "b0aa3388a1c23e727c5a3e7427c932d89ee791746b0081bbe56103e9ef3d291f", [:mix], [{:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "468489cda427b974a7cc9f03ace55368a83e1a7be12fba7e30969af78e5f8c70"},
+  "ex_aws_auth": {:hex, :ex_aws_auth, "1.3.1", "3963992d6f7cb251b53573603c3615cec70c3f4d86199fdb865ff440295ef7a4", [:mix], [{:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: true]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: true]}], "hexpm", "025793aa08fa419aabdb652db60edbdb2e12346bd447988a1bb5854c4dd64903"},
   "ex_doc": {:hex, :ex_doc, "0.40.1", "67542e4b6dde74811cfd580e2c0149b78010fd13001fda7cfeb2b2c2ffb1344d", [:mix], [{:earmark_parser, "~> 1.4.44", [hex: :earmark_parser, repo: "hexpm", optional: false]}, {:makeup_c, ">= 0.1.0", [hex: :makeup_c, repo: "hexpm", optional: true]}, {:makeup_elixir, "~> 0.14 or ~> 1.0", [hex: :makeup_elixir, repo: "hexpm", optional: false]}, {:makeup_erlang, "~> 0.1 or ~> 1.0", [hex: :makeup_erlang, repo: "hexpm", optional: false]}, {:makeup_html, ">= 0.1.0", [hex: :makeup_html, repo: "hexpm", optional: true]}], "hexpm", "bcef0e2d360d93ac19f01a85d58f91752d930c0a30e2681145feea6bd3516e00"},
   "expo": {:hex, :expo, "1.1.1", "4202e1d2ca6e2b3b63e02f69cfe0a404f77702b041d02b58597c00992b601db5", [:mix], [], "hexpm", "5fb308b9cb359ae200b7e23d37c76978673aa1b06e2b3075d814ce12c5811640"},
   "file_system": {:hex, :file_system, "1.1.1", "31864f4685b0148f25bd3fbef2b1228457c0c89024ad67f7a81a3ffbc0bbad3a", [:mix], [], "hexpm", "7a15ff97dfe526aeefb090a7a9d3d03aa907e100e262a0f8f7746b78f8f87a5d"},
@@ -17,8 +21,11 @@
   "gettext": {:hex, :gettext, "1.0.2", "5457e1fd3f4abe47b0e13ff85086aabae760497a3497909b8473e0acee57673b", [:mix], [{:expo, "~> 0.5.1 or ~> 1.0", [hex: :expo, repo: "hexpm", optional: false]}], "hexpm", "eab805501886802071ad290714515c8c4a17196ea76e5afc9d06ca85fb1bfeb3"},
   "heroicons": {:git, "https://github.com/tailwindlabs/heroicons.git", "0435d4ca364a608cc75e2f8683d374e55abbae26", [tag: "v2.2.0", sparse: "optimized", depth: 1]},
   "hpax": {:hex, :hpax, "1.0.3", "ed67ef51ad4df91e75cc6a1494f851850c0bd98ebc0be6e81b026e765ee535aa", [:mix], [], "hexpm", "8eab6e1cfa8d5918c2ce4ba43588e894af35dbd8e91e6e55c817bca5847df34a"},
+  "idna": {:hex, :idna, "6.1.1", "8a63070e9f7d0c62eb9d9fcb360a7de382448200fbbd1b106cc96d3d8099df8d", [:rebar3], [{:unicode_util_compat, "~> 0.7.0", [hex: :unicode_util_compat, repo: "hexpm", optional: false]}], "hexpm", "92376eb7894412ed19ac475e4a86f7b413c1b9fbb5bd16dccd57934157944cea"},
   "jason": {:hex, :jason, "1.4.4", "b9226785a9aa77b6857ca22832cffa5d5011a667207eb2a0ad56adb5db443b8a", [:mix], [{:decimal, "~> 1.0 or ~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}], "hexpm", "c5eb0cab91f094599f94d55bc63409236a8ec69a21a67814529e8d5f6cc90b3b"},
+  "jsv": {:hex, :jsv, "0.16.0", "b29e44da822db9f52010edf9db75b58f016434d9862bd76d18aec7a4712cf318", [:mix], [{:abnf_parsec, "~> 2.0", [hex: :abnf_parsec, repo: "hexpm", optional: false]}, {:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}, {:idna, "~> 6.1", [hex: :idna, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: true]}, {:nimble_options, "~> 1.0", [hex: :nimble_options, repo: "hexpm", optional: false]}, {:poison, ">= 3.0.0 and < 7.0.0", [hex: :poison, repo: "hexpm", optional: true]}, {:texture, "~> 0.3", [hex: :texture, repo: "hexpm", optional: false]}], "hexpm", "a4b2aaf5f62641640519da5de479e5704f6f7c8b6e323692bf71b4800d7b69ee"},
   "lazy_html": {:hex, :lazy_html, "0.1.10", "ffe42a0b4e70859cf21a33e12a251e0c76c1dff76391609bd56702a0ef5bc429", [:make, :mix], [{:cc_precompiler, "~> 0.1", [hex: :cc_precompiler, repo: "hexpm", optional: false]}, {:elixir_make, "~> 0.9.0", [hex: :elixir_make, repo: "hexpm", optional: false]}, {:fine, "~> 0.1.0", [hex: :fine, repo: "hexpm", optional: false]}], "hexpm", "50f67e5faa09d45a99c1ddf3fac004f051997877dc8974c5797bb5ccd8e27058"},
+  "llm_db": {:hex, :llm_db, "2026.3.0", "31c4235c52280cff46c166ffb19a2a53734e8fda44c82864f3f38521e7bc4c2d", [:mix], [{:deep_merge, "~> 1.0", [hex: :deep_merge, repo: "hexpm", optional: false]}, {:dotenvy, "~> 1.1", [hex: :dotenvy, repo: "hexpm", optional: false]}, {:igniter, "~> 0.7", [hex: :igniter, repo: "hexpm", optional: true]}, {:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: false]}, {:toml, "~> 0.7", [hex: :toml, repo: "hexpm", optional: false]}, {:zoi, "~> 0.10", [hex: :zoi, repo: "hexpm", optional: false]}], "hexpm", "4c9cbc6f47eb6d62eb52bca296692f9171c963e3eb3af69f3a555e8c5cff391e"},
   "makeup": {:hex, :makeup, "1.2.1", "e90ac1c65589ef354378def3ba19d401e739ee7ee06fb47f94c687016e3713d1", [:mix], [{:nimble_parsec, "~> 1.4", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "d36484867b0bae0fea568d10131197a4c2e47056a6fbe84922bf6ba71c8d17ce"},
   "makeup_elixir": {:hex, :makeup_elixir, "1.0.1", "e928a4f984e795e41e3abd27bfc09f51db16ab8ba1aebdba2b3a575437efafc2", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}, {:nimble_parsec, "~> 1.2.3 or ~> 1.3", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "7284900d412a3e5cfd97fdaed4f5ed389b8f2b4cb49efc0eb3bd10e2febf9507"},
   "makeup_erlang": {:hex, :makeup_erlang, "1.0.3", "4252d5d4098da7415c390e847c814bad3764c94a814a0b4245176215615e1035", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}], "hexpm", "953297c02582a33411ac6208f2c6e55f0e870df7f80da724ed613f10e6706afd"},
@@ -39,13 +46,21 @@
   "plug": {:hex, :plug, "1.19.1", "09bac17ae7a001a68ae393658aa23c7e38782be5c5c00c80be82901262c394c0", [:mix], [{:mime, "~> 1.0 or ~> 2.0", [hex: :mime, repo: "hexpm", optional: false]}, {:plug_crypto, "~> 1.1.1 or ~> 1.2 or ~> 2.0", [hex: :plug_crypto, repo: "hexpm", optional: false]}, {:telemetry, "~> 0.4.3 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "560a0017a8f6d5d30146916862aaf9300b7280063651dd7e532b8be168511e62"},
   "plug_crypto": {:hex, :plug_crypto, "2.1.1", "19bda8184399cb24afa10be734f84a16ea0a2bc65054e23a62bb10f06bc89491", [:mix], [], "hexpm", "6470bce6ffe41c8bd497612ffde1a7e4af67f36a15eea5f921af71cf3e11247c"},
   "req": {:hex, :req, "0.5.17", "0096ddd5b0ed6f576a03dde4b158a0c727215b15d2795e59e0916c6971066ede", [:mix], [{:brotli, "~> 0.3.1", [hex: :brotli, repo: "hexpm", optional: true]}, {:ezstd, "~> 1.0", [hex: :ezstd, repo: "hexpm", optional: true]}, {:finch, "~> 0.17", [hex: :finch, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}, {:mime, "~> 2.0.6 or ~> 2.1", [hex: :mime, repo: "hexpm", optional: false]}, {:nimble_csv, "~> 1.0", [hex: :nimble_csv, repo: "hexpm", optional: true]}, {:plug, "~> 1.0", [hex: :plug, repo: "hexpm", optional: true]}], "hexpm", "0b8bc6ffdfebbc07968e59d3ff96d52f2202d0536f10fef4dc11dc02a2a43e39"},
+  "req_llm": {:hex, :req_llm, "1.6.0", "9866726cb590848e4f68cfb0ed8030509219dd9e81880e395d1b0b273d2102e6", [:mix], [{:dotenvy, "~> 1.1", [hex: :dotenvy, repo: "hexpm", optional: false]}, {:ex_aws_auth, "~> 1.3", [hex: :ex_aws_auth, repo: "hexpm", optional: false]}, {:igniter, "~> 0.7", [hex: :igniter, repo: "hexpm", optional: true]}, {:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}, {:jsv, "~> 0.11", [hex: :jsv, repo: "hexpm", optional: false]}, {:llm_db, "~> 2026.1", [hex: :llm_db, repo: "hexpm", optional: false]}, {:nimble_options, "~> 1.1", [hex: :nimble_options, repo: "hexpm", optional: false]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: false]}, {:server_sent_events, "~> 0.2", [hex: :server_sent_events, repo: "hexpm", optional: false]}, {:splode, "~> 0.3.0", [hex: :splode, repo: "hexpm", optional: false]}, {:uniq, "~> 0.6", [hex: :uniq, repo: "hexpm", optional: false]}, {:zoi, "~> 0.14", [hex: :zoi, repo: "hexpm", optional: false]}], "hexpm", "0711ae09fa297e1e842837b0259ea179f9212893420abd9cb93a020b6bf69348"},
+  "server_sent_events": {:hex, :server_sent_events, "0.2.1", "f83b34f01241302a8bf451efc8dde3a36c533d5715463c31c653f3db8695f636", [:mix], [], "hexpm", "c8099ce4f9acd610eb7c8e0f89dba7d5d1c13300ea9884b0bd8662401d3cf96f"},
   "sobelow": {:hex, :sobelow, "0.14.1", "2f81e8632f15574cba2402bcddff5497b413c01e6f094bc0ab94e83c2f74db81", [:mix], [{:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "8fac9a2bd90fdc4b15d6fca6e1608efb7f7c600fa75800813b794ee9364c87f2"},
+  "splode": {:hex, :splode, "0.3.0", "ff8effecc509a51245df2f864ec78d849248647c37a75886033e3b1a53ca9470", [:mix], [], "hexpm", "73cfd0892d7316d6f2c93e6e8784bd6e137b2aa38443de52fd0a25171d106d81"},
   "tailwind": {:hex, :tailwind, "0.4.1", "e7bcc222fe96a1e55f948e76d13dd84a1a7653fb051d2a167135db3b4b08d3e9", [:mix], [], "hexpm", "6249d4f9819052911120dbdbe9e532e6bd64ea23476056adb7f730aa25c220d1"},
   "telemetry": {:hex, :telemetry, "1.3.0", "fedebbae410d715cf8e7062c96a1ef32ec22e764197f70cda73d82778d61e7a2", [:rebar3], [], "hexpm", "7015fc8919dbe63764f4b4b87a95b7c0996bd539e0d499be6ec9d7f3875b79e6"},
   "telemetry_metrics": {:hex, :telemetry_metrics, "1.1.0", "5bd5f3b5637e0abea0426b947e3ce5dd304f8b3bc6617039e2b5a008adc02f8f", [:mix], [{:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "e7b79e8ddfde70adb6db8a6623d1778ec66401f366e9a8f5dd0955c56bc8ce67"},
   "telemetry_poller": {:hex, :telemetry_poller, "1.3.0", "d5c46420126b5ac2d72bc6580fb4f537d35e851cc0f8dbd571acf6d6e10f5ec7", [:rebar3], [{:telemetry, "~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "51f18bed7128544a50f75897db9974436ea9bfba560420b646af27a9a9b35211"},
+  "texture": {:hex, :texture, "0.3.2", "ca68fc2804ce05ffe33cded85d69b5ebadb0828233227accfe3c574e34fd4e3f", [:mix], [{:abnf_parsec, "~> 2.0", [hex: :abnf_parsec, repo: "hexpm", optional: false]}], "hexpm", "43bb1069d9cf4309ed6f0ff65ade787a76f986b821ab29d1c96b5b5102cb769c"},
   "thousand_island": {:hex, :thousand_island, "1.4.3", "2158209580f633be38d43ec4e3ce0a01079592b9657afff9080d5d8ca149a3af", [:mix], [{:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "6e4ce09b0fd761a58594d02814d40f77daff460c48a7354a15ab353bb998ea0b"},
   "tidewave": {:hex, :tidewave, "0.5.5", "a125dfc87f99daf0e2280b3a9719b874c616ead5926cdf9cdfe4fcc19a020eff", [:mix], [{:circular_buffer, "~> 0.4 or ~> 1.0", [hex: :circular_buffer, repo: "hexpm", optional: false]}, {:igniter, "~> 0.6", [hex: :igniter, repo: "hexpm", optional: true]}, {:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}, {:phoenix_live_reload, ">= 1.6.1", [hex: :phoenix_live_reload, repo: "hexpm", optional: true]}, {:plug, "~> 1.17", [hex: :plug, repo: "hexpm", optional: false]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: false]}], "hexpm", "825ebb4fa20de005785efa21e5a88c04d81c3f57552638d12ff3def2f203dbf7"},
+  "toml": {:hex, :toml, "0.7.0", "fbcd773caa937d0c7a02c301a1feea25612720ac3fa1ccb8bfd9d30d822911de", [:mix], [], "hexpm", "0690246a2478c1defd100b0c9b89b4ea280a22be9a7b313a8a058a2408a2fa70"},
+  "unicode_util_compat": {:hex, :unicode_util_compat, "0.7.1", "a48703a25c170eedadca83b11e88985af08d35f37c6f664d6dcfb106a97782fc", [:rebar3], [], "hexpm", "b3a917854ce3ae233619744ad1e0102e05673136776fb2fa76234f3e03b23642"},
+  "uniq": {:hex, :uniq, "0.6.2", "51846518c037134c08bc5b773468007b155e543d53c8b39bafe95b0af487e406", [:mix], [{:ecto, "~> 3.0", [hex: :ecto, repo: "hexpm", optional: true]}], "hexpm", "95aa2a41ea331ef0a52d8ed12d3e730ef9af9dbc30f40646e6af334fbd7bc0fc"},
   "websock": {:hex, :websock, "0.5.3", "2f69a6ebe810328555b6fe5c831a851f485e303a7c8ce6c5f675abeb20ebdadc", [:mix], [], "hexpm", "6105453d7fac22c712ad66fab1d45abdf049868f253cf719b625151460b8b453"},
   "websock_adapter": {:hex, :websock_adapter, "0.5.9", "43dc3ba6d89ef5dec5b1d0a39698436a1e856d000d84bf31a3149862b01a287f", [:mix], [{:bandit, ">= 0.6.0", [hex: :bandit, repo: "hexpm", optional: true]}, {:plug, "~> 1.14", [hex: :plug, repo: "hexpm", optional: false]}, {:plug_cowboy, "~> 2.6", [hex: :plug_cowboy, repo: "hexpm", optional: true]}, {:websock, "~> 0.5", [hex: :websock, repo: "hexpm", optional: false]}], "hexpm", "5534d5c9adad3c18a0f58a9371220d75a803bf0b9a3d87e6fe072faaeed76a08"},
+  "zoi": {:hex, :zoi, "0.17.1", "406aa87bb4181f41dee64336b75434367b7d3e88db813b0e6db0ae2d0f81f743", [:mix], [{:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}, {:phoenix_html, "~> 2.14.2 or ~> 3.0 or ~> 4.1", [hex: :phoenix_html, repo: "hexpm", optional: true]}], "hexpm", "3a11bf3bc9189f988ac74e81b5d7ca0c689b2a20eed220746a7043aa528e2aab"},
 }

From 9ee3b58d1b3ddf8bedcab027faa6ab7a46395041 Mon Sep 17 00:00:00 2001
From: errantsky <code@errantsky.dev>
Date: Thu, 5 Mar 2026 21:50:43 -0800
Subject: [PATCH 2/3] fix: address review issues from multi-provider LLM branch

- Fix usage key mismatch: check both provider-specific and normalized
  key names for cache tokens (cache_creation_input_tokens vs
  cache_creation_tokens, cache_read_input_tokens vs cached_tokens)
- Add catch-all clause for unsupported message roles in build_context
- Make resolve_model!/2 actually raise on unknown keys (matching the !
  convention) instead of silently falling back to :large
- Add missing {:ok, non_string} clause in Config.resolve_model/2 to
  return a descriptive error instead of CaseClauseError
- Fix FallbackLLM default from RLM.LLM (now behaviour-only) to
  RLM.LLM.ReqLLM
- Strip "anthropic:" provider prefix in RLM.LLM.Anthropic before
  sending to the Anthropic API (models map stores prefixed specs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 lib/rlm/config.ex              | 10 ++++++++--
 lib/rlm/llm/anthropic.ex       |  5 +++++
 lib/rlm/llm/req_llm.ex         |  9 +++++++--
 lib/rlm/replay/fallback_llm.ex |  2 +-
 lib/rlm/worker.ex              |  9 +++++++--
 5 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/lib/rlm/config.ex b/lib/rlm/config.ex
index fe268ff..1e2004a 100644
--- a/lib/rlm/config.ex
+++ b/lib/rlm/config.ex
@@ -109,8 +109,14 @@ defmodule RLM.Config do
   @spec resolve_model(t(), atom()) :: {:ok, String.t()} | {:error, String.t()}
   def resolve_model(%__MODULE__{models: models}, key) when is_atom(key) do
     case Map.fetch(models, key) do
-      {:ok, spec} when is_binary(spec) -> {:ok, spec}
-      :error -> {:error, "Unknown model key: #{key}"}
+      {:ok, spec} when is_binary(spec) ->
+        {:ok, spec}
+
+      {:ok, other} ->
+        {:error, "Model key #{key} has invalid spec: #{inspect(other)} (expected a string)"}
+
+      :error ->
+        {:error, "Unknown model key: #{key}"}
     end
   end
 
diff --git a/lib/rlm/llm/anthropic.ex b/lib/rlm/llm/anthropic.ex
index 2fe0587..9c51d6d 100644
--- a/lib/rlm/llm/anthropic.ex
+++ b/lib/rlm/llm/anthropic.ex
@@ -16,6 +16,7 @@ defmodule RLM.LLM.Anthropic do
   @impl true
   def chat(messages, model, config, opts \\ []) do
     url = String.trim_trailing(config.api_base_url, "/") <> "/v1/messages"
+    model = strip_provider_prefix(model)
 
     {system_text, user_messages} = extract_system(messages)
 
@@ -71,6 +72,10 @@ defmodule RLM.LLM.Anthropic do
     end
   end
 
+  # Strip "anthropic:" prefix if present (models map stores provider-prefixed specs)
+  defp strip_provider_prefix("anthropic:" <> bare), do: bare
+  defp strip_provider_prefix(model), do: model
+
   defp extract_system(messages) do
     case Enum.split_with(messages, fn m -> m.role == :system end) do
       {[], rest} -> {nil, rest}
diff --git a/lib/rlm/llm/req_llm.ex b/lib/rlm/llm/req_llm.ex
index a5866e0..ab372ed 100644
--- a/lib/rlm/llm/req_llm.ex
+++ b/lib/rlm/llm/req_llm.ex
@@ -62,6 +62,7 @@ defmodule RLM.LLM.ReqLLM do
         case msg.role do
           :user -> ReqLLM.Context.user(msg.content)
           :assistant -> ReqLLM.Context.assistant(msg.content)
+          other -> raise ArgumentError, "unsupported message role #{inspect(other)}"
         end
       end)
 
@@ -120,8 +121,12 @@ defmodule RLM.LLM.ReqLLM do
       prompt_tokens: Map.get(raw, :input_tokens),
       completion_tokens: Map.get(raw, :output_tokens),
       total_tokens: Map.get(raw, :total_tokens),
-      cache_creation_input_tokens: Map.get(raw, :cache_creation_input_tokens),
-      cache_read_input_tokens: Map.get(raw, :cache_read_input_tokens)
+      # Anthropic provider includes :cache_creation_input_tokens;
+      # other providers use the normalized :cache_creation_tokens key
+      cache_creation_input_tokens:
+        Map.get(raw, :cache_creation_input_tokens) || Map.get(raw, :cache_creation_tokens),
+      cache_read_input_tokens:
+        Map.get(raw, :cache_read_input_tokens) || Map.get(raw, :cached_tokens)
     }
   end
 
diff --git a/lib/rlm/replay/fallback_llm.ex b/lib/rlm/replay/fallback_llm.ex
index 71ad642..aa0f80b 100644
--- a/lib/rlm/replay/fallback_llm.ex
+++ b/lib/rlm/replay/fallback_llm.ex
@@ -13,7 +13,7 @@ defmodule RLM.Replay.FallbackLLM do
   def chat(messages, model, config, opts \\ []) do
     case pop_entry() do
       nil ->
-        fallback_module = Process.get(:rlm_replay_fallback_module, RLM.LLM)
+        fallback_module = Process.get(:rlm_replay_fallback_module, RLM.LLM.ReqLLM)
         fallback_module.chat(messages, model, config, opts)
 
       entry ->
diff --git a/lib/rlm/worker.ex b/lib/rlm/worker.ex
index 1f1610d..fb90b2f 100644
--- a/lib/rlm/worker.ex
+++ b/lib/rlm/worker.ex
@@ -960,8 +960,13 @@ defmodule RLM.Worker do
 
   defp resolve_model!(config, key) do
     case RLM.Config.resolve_model(config, key) do
-      {:ok, spec} -> spec
-      {:error, _} -> Map.get(config.models, :large, "claude-sonnet-4-6")
+      {:ok, spec} ->
+        spec
+
+      {:error, reason} ->
+        raise ArgumentError,
+              "Cannot resolve model key #{inspect(key)}: #{reason}. " <>
+                "Available keys: #{inspect(Map.keys(config.models))}"
     end
   end
 

From ce8a6e84adffe6689f35213f1ccf4aa7142d55a5 Mon Sep 17 00:00:00 2001
From: errantsky <code@errantsky.dev>
Date: Fri, 6 Mar 2026 08:28:36 -0800
Subject: [PATCH 3/3] fix: address PR review findings and improve docs/examples
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Review agents identified several issues across 5 categories:

Silent failures:
- encode_object/1 returns {:error, :no_content} instead of empty string
- Tape.get_events/1 catches :noproc specifically, logs unexpected exits
- FallbackLLM logs when transitioning from tape to live LLM
- extract_usage/1 warns when all usage fields are nil
- context_window_for/2 warns on unknown model keys

Documentation:
- CLAUDE.md: remove stale cost config rows, fix env var references,
  fix models default to bare names, rewrite agent orientation section
  with key contracts, DI patterns, and common modification patterns
- Fix moduledocs: Replay (default module), Anthropic (differentiator),
  Worker (provider-agnostic), LLM behaviour (bare model names)
- Fix "Ollama (via vLLM)" → "Ollama (local models)" in config/req_llm

Examples:
- Update all examples from CLAUDE_API_KEY to ANTHROPIC_API_KEY
- Add examples/local_models.exs for Ollama/local model usage
- Update mix rlm.examples task with local_models entry

Tests:
- Add test/rlm/config_test.exs with 16 tests covering resolve_model/2,
  context_window_for/2, load/1 defaults, models map, and API key

Config:
- Remove unused top-level `require Logger` from config.ex

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 CHANGELOG.md                     |  29 +++++-
 CLAUDE.md                        | 105 +++++++++++++++++---
 examples/code_review.exs         |   2 +-
 examples/local_models.exs        | 164 +++++++++++++++++++++++++++++++
 examples/map_reduce_analysis.exs |   2 +-
 examples/research_synthesis.exs  |   9 +-
 examples/smoke_test.exs          |  10 +-
 examples/web_fetch.exs           |   2 +-
 lib/mix/tasks/rlm.examples.ex    |  20 ++--
 lib/rlm/config.ex                |  25 +++--
 lib/rlm/llm.ex                   |   2 +-
 lib/rlm/llm/anthropic.ex         |  10 +-
 lib/rlm/llm/req_llm.ex           |  35 +++++--
 lib/rlm/replay.ex                |   2 +-
 lib/rlm/replay/fallback_llm.ex   |   7 ++
 lib/rlm/replay/tape.ex           |  18 +++-
 lib/rlm/worker.ex                |   2 +-
 test/rlm/config_test.exs         | 125 +++++++++++++++++++++++
 18 files changed, 510 insertions(+), 59 deletions(-)
 create mode 100644 examples/local_models.exs
 create mode 100644 test/rlm/config_test.exs

diff --git a/CHANGELOG.md b/CHANGELOG.md
index f3d9af5..85f15f9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -21,9 +21,9 @@ All notable changes to this project are documented here.
 - `RLM.LLM` refactored to a pure behaviour module + shared utilities
   (`extract_structured/1`, `response_schema/0`); no longer contains an implementation.
 - `models` config field — `%{atom() => String.t()}` map of symbolic keys to
-  provider-prefixed model specs. Default: `%{large: "anthropic:claude-sonnet-4-6",
-  small: "anthropic:claude-haiku-4-5"}`. Pass custom maps for Ollama/OpenAI:
-  `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}`
+  model specs. Default: `%{large: "claude-sonnet-4-6", small: "claude-haiku-4-5"}`.
+  Bare names are auto-prefixed with `"anthropic:"` by `ReqLLM`. Pass custom maps
+  for Ollama/OpenAI: `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}`
 - `RLM.Config.resolve_model/2` — looks up a model key in the `models` map
 - `RLM.Config.context_window_for/2` — resolves context window size for a model key
   (legacy fields for `:large`/`:small`, default 128k for custom keys)
@@ -66,9 +66,32 @@ All notable changes to this project are documented here.
   then falls back to a live LLM module when the tape is exhausted
 - `:fallback` option on `RLM.replay/2` — `:error` (default) or `:live` to switch
   to live LLM calls when the tape runs out (e.g., because a patch caused extra iterations)
+- `examples/local_models.exs` — new example demonstrating Ollama/local model usage
+  with no API key required. Registered as `mix rlm.examples local_models`
+- `test/rlm/config_test.exs` — 16 new unit tests for `Config.load/1`,
+  `Config.resolve_model/2`, and `Config.context_window_for/2`
 - 17 tests covering recording, tape construction, replay LLM, replay orchestration,
   patching, fallback behavior, and the public API
 
+### Fixed
+
+- `RLM.LLM.ReqLLM.encode_object/1` now returns an explicit error instead of silently
+  falling back to an empty string when the LLM response contains no usable content
+- `RLM.LLM.ReqLLM.extract_usage/1` logs a warning when token usage extraction fails
+  (all fields nil despite non-empty response), preventing silent zero-cost reporting
+- `RLM.Replay.Tape.get_events/1` now catches `:noproc` exits specifically and logs
+  a warning for unexpected exit reasons, instead of broadly swallowing all exits
+- `RLM.Replay.FallbackLLM` now logs when switching from tape replay to live LLM calls
+- `RLM.Config.context_window_for/2` now logs a warning when using the 128k default
+  for custom model keys, making it easier to diagnose compaction behavior
+- `RLM.Replay` moduledoc corrected: fallback default is `RLM.LLM.ReqLLM` (not `RLM.LLM`)
+- `RLM.Worker` moduledoc updated to be provider-agnostic (no longer references "Claude's
+  output_config" specifically)
+- `CLAUDE.md` — removed stale `cost_per_1k_*` config fields; fixed `models` default to
+  match actual bare-name defaults; updated env var references to `ANTHROPIC_API_KEY`
+- All examples updated from `CLAUDE_API_KEY` to `ANTHROPIC_API_KEY`; smoke test checks
+  both env vars
+
 **Distributed Erlang node support**
 
 - `RLM.Node` — lightweight wrapper for OTP distribution with three public functions:
diff --git a/CLAUDE.md b/CLAUDE.md
index dba0805..7ebc6a0 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -86,10 +86,10 @@ mix test
 # Run tests with trace output
 mix test --trace
 
-# Run live API tests (requires CLAUDE_API_KEY env var)
+# Run live API tests (requires ANTHROPIC_API_KEY or CLAUDE_API_KEY env var)
 mix test --include live_api
 
-# Live smoke test (requires CLAUDE_API_KEY env var)
+# Live smoke test (requires ANTHROPIC_API_KEY or CLAUDE_API_KEY env var)
 mix rlm.smoke
 
 # Interactive shell
@@ -178,16 +178,16 @@ LLM responses use structured output (JSON schema) to constrain responses to
 `{"reasoning": "...", "code": "..."}` objects. Feedback messages after eval are also
 structured JSON.
 
-The `models` config field maps symbolic keys to provider-prefixed specs:
+The `models` config field maps symbolic keys to model specs:
 
 ```elixir
 RLM.run(context, query,
   models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"})
 ```
 
-Default models:
-- Large: `anthropic:claude-sonnet-4-6`
-- Small: `anthropic:claude-haiku-4-5`
+Default models (bare names; `ReqLLM` auto-prefixes with `"anthropic:"`):
+- Large: `claude-sonnet-4-6`
+- Small: `claude-haiku-4-5`
 
 ## Module Map
 
@@ -264,7 +264,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 |---|---|---|
 | `api_base_url` | `"https://api.anthropic.com"` | Anthropic API base URL |
 | `api_key` | `ANTHROPIC_API_KEY` env var | API key for LLM requests (falls back to `CLAUDE_API_KEY`) |
-| `models` | `%{large: "anthropic:claude-sonnet-4-6", small: "anthropic:claude-haiku-4-5"}` | Named model map; keys are atoms, values are `"provider:model"` specs |
+| `models` | `%{large: "claude-sonnet-4-6", small: "claude-haiku-4-5"}` | Named model map; keys are atoms, values are model specs. Bare names are auto-prefixed with `"anthropic:"` by `ReqLLM` |
 | `model_large` | `claude-sonnet-4-6` | Legacy; used to build default `models` map |
 | `model_small` | `claude-haiku-4-5` | Legacy; used to build default `models` map |
 | `max_iterations` | `25` | Per-worker LLM turn limit |
@@ -277,10 +277,6 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 | `eval_timeout` | `300_000` | ms per eval (5 min) |
 | `llm_timeout` | `120_000` | ms per LLM request (2 min) |
 | `subcall_timeout` | `600_000` | ms per subcall (10 min) |
-| `cost_per_1k_prompt_tokens_large` | `0.003` | Cost tracking for large model input |
-| `cost_per_1k_prompt_tokens_small` | `0.0008` | Cost tracking for small model input |
-| `cost_per_1k_completion_tokens_large` | `0.015` | Cost tracking for large model output |
-| `cost_per_1k_completion_tokens_small` | `0.004` | Cost tracking for small model output |
 | `enable_otel` | `false` | Enable OpenTelemetry integration |
 | `enable_event_log` | `true` | Enable per-run EventLog trace agents |
 | `event_log_capture_full_stdout` | `false` | Store full stdout in traces (vs truncated) |
@@ -293,7 +289,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 - Worker/keep_alive tests run `async: false` since MockLLM uses global state
 - Tool tests and sandbox tests can run `async: true` (no global state)
 - Live API tests tagged with `@moduletag :live_api` and excluded by default
-- `mix test --include live_api` requires `CLAUDE_API_KEY` env var
+- `mix test --include live_api` requires `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`) env var
 - Test support files in `test/support/`
 - Tool tests use a per-test temp directory (created in `setup`, cleaned in `on_exit`)
 - Worker concurrency/depth tests use `RLM.Test.Helpers.start_test_run/1` to create a Run, then spawn Workers via `RLM.Run.start_worker/2`
@@ -304,7 +300,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`.
 - Workers use `restart: :temporary` — they terminate normally after completion
 - The `llm_module` config field enables dependency injection for testing
 - Bash tool uses `Task.async` + `Task.yield/2` (not `System.cmd` — it has no `:timeout` option)
-- `.env` file with `CLAUDE_API_KEY` should exist at project root but must not be committed
+- `.env` file with `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`) should exist at project root but must not be committed
 - `RLM.run/3` monitors the Worker with `Process.monitor` so crashes return `{:error, reason}`
   rather than hanging indefinitely
 
@@ -327,6 +323,8 @@ The dashboard is a Phoenix 1.8 LiveView application. Key conventions:
 
 ## Orientation for Coding Agents
 
+### Getting Started
+
 When starting a task, read these files in order:
 
 1. **`CLAUDE.md`** (this file) — architecture, invariants, module map
@@ -334,13 +332,90 @@ When starting a task, read these files in order:
 3. The specific module(s) relevant to your task (see Module Map above)
 4. The corresponding test file to understand expected behaviour
 
-Key invariants **never to break**:
+### Key Invariants (Never Break These)
+
 - Raw input data must not enter any LLM context window (use `preview/2` or metadata only)
 - Workers are `:temporary` — do not change their restart strategy
 - The async-eval pattern in `RLM.Worker` is intentional; do not make eval synchronous
 - All session tests must use `async: false` (MockLLM is global ETS state)
+- Run → Worker communication is always `send/2`, never `GenServer.call` (deadlock prevention)
+
+### Key Contracts & Interfaces
+
+**LLM Behaviour** (`RLM.LLM`):
+```elixir
+@callback chat(messages :: [map()], model :: String.t(), config :: RLM.Config.t(), opts :: keyword()) ::
+  {:ok, json_string :: String.t(), usage :: usage()} | {:error, String.t()}
+```
+All LLM modules (`ReqLLM`, `Anthropic`, `MockLLM`, `Replay.LLM`, `Replay.FallbackLLM`) implement
+this same callback. The `json_string` return is always a JSON-encoded string, never a parsed map.
+
+**Usage type**: `%{prompt_tokens: integer | nil, completion_tokens: integer | nil, total_tokens: integer | nil, cache_creation_input_tokens: integer | nil, cache_read_input_tokens: integer | nil}`
+
+**Model resolution**: Use `RLM.Config.resolve_model(config, :large | :small | atom())` → `{:ok, "provider:model-name"}` or `{:error, reason}`. In Worker, use `resolve_model!/2` (raises on unknown keys).
+
+**Tool Behaviour** (`RLM.Tool`):
+```elixir
+@callback name() :: String.t()
+@callback description() :: String.t()
+@callback execute(map()) :: {:ok, String.t()} | {:error, String.t()}
+```
+
+### Dependency Injection Pattern
+
+The `llm_module` config field is the primary injection point:
+- **Production**: `RLM.LLM.ReqLLM` (default) — multi-provider via `req_llm`
+- **Testing**: `RLM.Test.MockLLM` — ETS-based response queue, set in `config/test.exs`
+- **Legacy**: `RLM.LLM.Anthropic` — direct Anthropic HTTP client
+- **Replay**: `RLM.Replay.LLM` / `RLM.Replay.FallbackLLM` — tape-based, set by `RLM.Replay`
+
+When adding a new LLM feature, implement it in the behaviour callback — the Worker
+calls `config.llm_module.chat(...)` and is provider-agnostic.
+
+### Testing Patterns
+
+**MockLLM usage** — queue expected responses before running Workers:
+```elixir
+RLM.Test.MockLLM.enqueue(%{
+  "reasoning" => "I'll count the lines",
+  "code" => ~s(final_answer = 4)
+})
+```
+MockLLM is global ETS state. Tests using it must be `async: false`.
+
+**Creating a test Run** — use `RLM.Test.Helpers.start_test_run/1`:
+```elixir
+{run_pid, run_id} = RLM.Test.Helpers.start_test_run(config)
+{:ok, worker_pid, span_id} = RLM.Run.start_worker(run_pid, worker_opts)
+```
+
+**Tool tests** — use per-test temp dirs (created in `setup`, cleaned in `on_exit`);
+these can run `async: true` since tools have no global state.
+
+### Common Modification Patterns
+
+**Adding a new config field:**
+1. Add to `defstruct` in `config.ex`
+2. Add to `load/1` with `get(overrides, :key, default)`
+3. Add row to CLAUDE.md Config Fields table
+4. Add to CHANGELOG.md
+
+**Adding a new tool:**
+1. Create `lib/rlm/tools/my_tool.ex` implementing `RLM.Tool`
+2. Add to `RLM.ToolRegistry.all/0`
+3. Add wrapper function to `RLM.Sandbox`
+4. Add to system prompt in `priv/system_prompt.md`
+5. Add row to CLAUDE.md Module Map (Filesystem Tools section)
+
+**Adding a new LLM behaviour implementation:**
+1. Create module with `@behaviour RLM.LLM`
+2. Implement `chat/4` returning `{:ok, json_string, usage}` or `{:error, string}`
+3. Users select it via `llm_module:` config override
+4. Add row to CLAUDE.md Module Map
+
+### Before Committing
 
-Before committing, always run:
+Always run:
 ```bash
 mix compile --warnings-as-errors
 mix test
diff --git a/examples/code_review.exs b/examples/code_review.exs
index 67c39cd..674bd65 100644
--- a/examples/code_review.exs
+++ b/examples/code_review.exs
@@ -11,7 +11,7 @@
 # - Filesystem tool usage visible in code blocks
 #
 # Usage:
-#   export CLAUDE_API_KEY=sk-ant-...
+#   export ANTHROPIC_API_KEY=sk-ant-...
 #   mix run examples/code_review.exs
 #
 # Or via the Mix task:
diff --git a/examples/local_models.exs b/examples/local_models.exs
new file mode 100644
index 0000000..94bc6b2
--- /dev/null
+++ b/examples/local_models.exs
@@ -0,0 +1,164 @@
+# examples/local_models.exs
+#
+# Local Model Usage — demonstrates running RLM with local Ollama models
+# instead of Anthropic's cloud API. No API key required.
+#
+# Prerequisites:
+#   1. Install Ollama: https://ollama.com
+#   2. Pull a model:
+#        ollama pull qwen3:8b
+#      Or any other model you prefer (llama3.2, mistral, etc.)
+#   3. Ollama server must be running (it starts automatically after install)
+#
+# Usage:
+#   mix run examples/local_models.exs
+#
+# Or via the Mix task:
+#   mix rlm.examples local_models
+#
+# Configuration options shown:
+#   - models: %{large: "ollama:model", small: "ollama:model"}
+#   - Using the same model for both large and small
+#   - Using different models for parent vs subcall workers
+#   - Overriding max_iterations for faster local runs
+
+defmodule RLM.Examples.LocalModels do
+  @moduledoc false
+
+  # Change these to match the models you have pulled locally.
+  # Run `ollama list` to see available models.
+  @default_model "ollama:qwen3:8b"
+
+  def run do
+    check_ollama!()
+
+    model = System.get_env("RLM_LOCAL_MODEL", @default_model)
+    IO.puts("\nRLM Local Models Example")
+    IO.puts("========================")
+    IO.puts("Model: #{model}")
+    IO.puts("")
+
+    results =
+      [
+        {&test_basic_run/1, "Basic run"},
+        {&test_multi_step/1, "Multi-step"},
+        {&test_custom_model_map/1, "Custom model map"}
+      ]
+      |> Enum.map(fn {test_fn, name} ->
+        IO.write("  Running: #{name}... ")
+
+        case test_fn.(model) do
+          {:ok, detail} ->
+            IO.puts("PASS — #{detail}")
+            {name, :pass}
+
+          {:error, reason} ->
+            IO.puts("FAIL — #{reason}")
+            {name, :fail}
+        end
+      end)
+
+    passes = Enum.count(results, fn {_, s} -> s == :pass end)
+    fails = Enum.count(results, fn {_, s} -> s == :fail end)
+    IO.puts("\n#{passes} passed, #{fails} failed out of #{length(results)} tests")
+
+    if fails > 0, do: System.halt(1)
+  end
+
+  # ---------------------------------------------------------------------------
+  # Tests
+  # ---------------------------------------------------------------------------
+
+  defp test_basic_run(model) do
+    # Simplest usage: override the models map to use a local model
+    case RLM.run(
+           "Elixir, Rust, Python",
+           "Count the programming languages. Return the count as an integer.",
+           models: %{large: model, small: model},
+           max_iterations: 10
+         ) do
+      {:ok, 3, _run_id} ->
+        {:ok, "counted 3 languages"}
+
+      {:ok, other, _} ->
+        {:error, "expected 3, got #{inspect(other)}"}
+
+      {:error, reason} ->
+        {:error, inspect(reason)}
+    end
+  end
+
+  defp test_multi_step(model) do
+    # Multi-step reasoning with a local model
+    case RLM.run(
+           "apple, banana, cherry",
+           "First store the number of items in a variable called count. " <>
+             "Then set final_answer to count * 100.",
+           models: %{large: model, small: model},
+           max_iterations: 10
+         ) do
+      {:ok, 300, _run_id} ->
+        {:ok, "3 * 100 = 300"}
+
+      {:ok, other, _} ->
+        {:error, "expected 300, got #{inspect(other)}"}
+
+      {:error, reason} ->
+        {:error, inspect(reason)}
+    end
+  end
+
+  defp test_custom_model_map(model) do
+    # Demonstrate using different models for different roles.
+    # In practice, you might use a larger model for the parent worker
+    # and a smaller one for subcalls:
+    #
+    #   models: %{
+    #     large: "ollama:qwen3:32b",
+    #     small: "ollama:qwen3:8b",
+    #     fast:  "ollama:qwen3:1.7b"
+    #   }
+    #
+    # For this test, we use the same model for both to keep it simple.
+    case RLM.run(
+           "Hello",
+           "Set final_answer to the string \"hello from local model\"",
+           models: %{large: model, small: model},
+           max_iterations: 5
+         ) do
+      {:ok, result, _run_id} when is_binary(result) ->
+        {:ok, "got: #{String.slice(result, 0, 50)}"}
+
+      {:ok, other, _} ->
+        {:error, "expected string, got #{inspect(other)}"}
+
+      {:error, reason} ->
+        {:error, inspect(reason)}
+    end
+  end
+
+  # ---------------------------------------------------------------------------
+  # Helpers
+  # ---------------------------------------------------------------------------
+
+  defp check_ollama! do
+    case System.cmd("which", ["ollama"], stderr_to_stdout: true) do
+      {_, 0} ->
+        :ok
+
+      _ ->
+        IO.puts("""
+
+          Ollama not found. Install it from https://ollama.com
+
+          After installing, pull a model:
+            ollama pull qwen3:8b
+
+        """)
+
+        System.halt(1)
+    end
+  end
+end
+
+RLM.Examples.LocalModels.run()
diff --git a/examples/map_reduce_analysis.exs b/examples/map_reduce_analysis.exs
index 9398a4c..8395cb5 100644
--- a/examples/map_reduce_analysis.exs
+++ b/examples/map_reduce_analysis.exs
@@ -9,7 +9,7 @@
 # - Bindings growing across iterations
 #
 # Usage:
-#   export CLAUDE_API_KEY=sk-ant-...
+#   export ANTHROPIC_API_KEY=sk-ant-...
 #   mix run examples/map_reduce_analysis.exs
 #
 # Or via the Mix task:
diff --git a/examples/research_synthesis.exs b/examples/research_synthesis.exs
index 77cd91e..cf7414c 100644
--- a/examples/research_synthesis.exs
+++ b/examples/research_synthesis.exs
@@ -11,7 +11,7 @@
 # - Mix of direct queries and full subcalls visible as different node types
 #
 # Usage:
-#   export CLAUDE_API_KEY=sk-ant-...
+#   export ANTHROPIC_API_KEY=sk-ant-...
 #   mix run examples/research_synthesis.exs
 #
 # Or via the Mix task:
@@ -215,7 +215,12 @@ defmodule RLM.Examples.ResearchSynthesis do
 
     if narrative = result["narrative"] do
       IO.puts("    Narrative preview:")
-      narrative |> String.slice(0, 400) |> String.split("\n") |> Enum.each(&IO.puts("      #{&1}"))
+
+      narrative
+      |> String.slice(0, 400)
+      |> String.split("\n")
+      |> Enum.each(&IO.puts("      #{&1}"))
+
       IO.puts("      ...")
     end
   end
diff --git a/examples/smoke_test.exs b/examples/smoke_test.exs
index abf53f6..a0aaa1f 100644
--- a/examples/smoke_test.exs
+++ b/examples/smoke_test.exs
@@ -1,9 +1,9 @@
 # examples/smoke_test.exs
 #
-# Live smoke test for the RLM engine. Requires CLAUDE_API_KEY.
+# Live smoke test for the RLM engine. Requires ANTHROPIC_API_KEY (or CLAUDE_API_KEY).
 #
 # Usage:
-#   export CLAUDE_API_KEY=sk-ant-...
+#   export ANTHROPIC_API_KEY=sk-ant-...
 #   mix run examples/smoke_test.exs
 #
 # Or via the Mix task:
@@ -152,9 +152,11 @@ defmodule RLM.SmokeTest do
   # ---------------------------------------------------------------------------
 
   defp check_api_key! do
-    case System.get_env("CLAUDE_API_KEY") do
+    key = System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY")
+
+    case key do
       nil ->
-        IO.puts("\n  ERROR: CLAUDE_API_KEY not set. Export it before running.\n")
+        IO.puts("\n  ERROR: ANTHROPIC_API_KEY not set. Export it before running.\n")
         System.halt(1)
 
       key ->
diff --git a/examples/web_fetch.exs b/examples/web_fetch.exs
index c5a2129..1b80edf 100644
--- a/examples/web_fetch.exs
+++ b/examples/web_fetch.exs
@@ -11,7 +11,7 @@
 # Uses the public GitHub API (no auth required, 60 requests/hour limit).
 #
 # Usage:
-#   export CLAUDE_API_KEY=sk-ant-...
+#   export ANTHROPIC_API_KEY=sk-ant-...
 #   mix run examples/web_fetch.exs
 #
 # Or via the Mix task:
diff --git a/lib/mix/tasks/rlm.examples.ex b/lib/mix/tasks/rlm.examples.ex
index a7dd370..2a95d4b 100644
--- a/lib/mix/tasks/rlm.examples.ex
+++ b/lib/mix/tasks/rlm.examples.ex
@@ -1,6 +1,6 @@
 defmodule Mix.Tasks.Rlm.Examples do
   use Boundary, classify_to: RLM
-  @shortdoc "Run RLM example scenarios against the live Anthropic API"
+  @shortdoc "Run RLM example scenarios against live LLM providers"
   @moduledoc """
   Runs RLM example scenarios that exercise multi-iteration, subcall depth,
   parallel queries, schema-mode extraction, and filesystem tools.
@@ -8,17 +8,18 @@ defmodule Mix.Tasks.Rlm.Examples do
   These produce rich execution traces viewable in the web dashboard
   (`mix phx.server` → http://localhost:4000).
 
-  Requires the `CLAUDE_API_KEY` environment variable to be set.
+  Cloud examples require `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`).
+  The `local_models` example uses Ollama and requires no API key.
 
   ## Usage
 
-      # Run all examples
+      # Run all cloud examples
       mix rlm.examples
 
       # Run a specific example
       mix rlm.examples map_reduce
       mix rlm.examples code_review
-      mix rlm.examples research_synthesis
+      mix rlm.examples local_models
 
       # List available examples
       mix rlm.examples --list
@@ -45,6 +46,11 @@ defmodule Mix.Tasks.Rlm.Examples do
       "examples/web_fetch.exs",
       "RLM.Examples.WebFetch",
       "Web Fetch & JSON Processing — curl + jq via bash tool"
+    },
+    "local_models" => {
+      "examples/local_models.exs",
+      "RLM.Examples.LocalModels",
+      "Local Models — Ollama/vLLM usage (no API key required)"
     }
   }
 
@@ -131,9 +137,11 @@ defmodule Mix.Tasks.Rlm.Examples do
   end
 
   defp check_api_key! do
-    case System.get_env("CLAUDE_API_KEY") do
+    key = System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY")
+
+    case key do
       nil ->
-        IO.puts("\n  ERROR: CLAUDE_API_KEY not set. Export it before running.\n")
+        IO.puts("\n  ERROR: ANTHROPIC_API_KEY not set. Export it before running.\n")
         System.halt(1)
 
       key ->
diff --git a/lib/rlm/config.ex b/lib/rlm/config.ex
index 1e2004a..95bc459 100644
--- a/lib/rlm/config.ex
+++ b/lib/rlm/config.ex
@@ -9,32 +9,31 @@ defmodule RLM.Config do
 
       %RLM.Config{
         models: %{
-          large: "anthropic:claude-sonnet-4-6",
-          small: "anthropic:claude-haiku-4-5"
+          large: "claude-sonnet-4-6",
+          small: "claude-haiku-4-5"
         }
       }
 
   Model specs follow the `req_llm` naming convention: `"provider:model-name"`.
   For backward compatibility, bare model names without a provider prefix
-  are treated as Anthropic models.
+  are treated as Anthropic models by `RLM.LLM.ReqLLM` (which prepends
+  `"anthropic:"` automatically).
 
   ## Supported Providers
 
-  Any provider supported by `req_llm`: Anthropic, OpenAI, Ollama (via vLLM),
+  Any provider supported by `req_llm`: Anthropic, OpenAI, Ollama (local models),
   Google Gemini, Groq, and more. For local Ollama:
 
       RLM.run("data", "query",
         models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"})
   """
 
-  require Logger
-
   @default_context_window 128_000
 
   defstruct [
     :api_base_url,
     :api_key,
-    # Legacy model fields — prefer `models` map
+    # Used to build default `models` map; prefer passing `models` directly
     :model_large,
     :model_small,
     :models,
@@ -127,10 +126,20 @@ defmodule RLM.Config do
   1. Legacy `context_window_tokens_large/small` fields (for `:large`/`:small` keys)
   2. Default of #{@default_context_window} tokens for unknown models
   """
+  require Logger
+
   @spec context_window_for(t(), atom()) :: non_neg_integer()
   def context_window_for(%__MODULE__{} = config, :large), do: config.context_window_tokens_large
   def context_window_for(%__MODULE__{} = config, :small), do: config.context_window_tokens_small
-  def context_window_for(%__MODULE__{}, _key), do: @default_context_window
+
+  def context_window_for(%__MODULE__{}, key) do
+    Logger.warning(
+      "No context window configured for model key #{inspect(key)}, " <>
+        "using default of #{@default_context_window} tokens"
+    )
+
+    @default_context_window
+  end
 
   defp resolve_api_key do
     case System.get_env("ANTHROPIC_API_KEY") do
diff --git a/lib/rlm/llm.ex b/lib/rlm/llm.ex
index 0b6e920..82e599d 100644
--- a/lib/rlm/llm.ex
+++ b/lib/rlm/llm.ex
@@ -29,7 +29,7 @@ defmodule RLM.LLM do
   ## Arguments
 
     * `messages` — list of message maps with `:role` and `:content` fields
-    * `model` — provider-prefixed model spec (e.g., `"anthropic:claude-sonnet-4-6"`)
+    * `model` — model spec, optionally provider-prefixed (e.g., `"anthropic:claude-sonnet-4-6"` or bare `"claude-sonnet-4-6"`). Implementations should handle both formats.
     * `config` — `RLM.Config.t()` struct
     * `opts` — keyword list; supports `:schema` for structured output
 
diff --git a/lib/rlm/llm/anthropic.ex b/lib/rlm/llm/anthropic.ex
index 9c51d6d..388ce00 100644
--- a/lib/rlm/llm/anthropic.ex
+++ b/lib/rlm/llm/anthropic.ex
@@ -2,13 +2,13 @@ defmodule RLM.LLM.Anthropic do
   @moduledoc """
   Hand-rolled Anthropic Messages API client.
 
-  Preserved as a fallback for users who need direct control over
-  Anthropic-specific features (prompt caching, etc.). The default
-  backend is `RLM.LLM.ReqLLM`.
+  Preserved for users who prefer a dependency-free Anthropic-only client
+  or need to customize Anthropic API request bodies directly (e.g., custom
+  headers, non-standard API versions). The default backend is `RLM.LLM.ReqLLM`.
 
-  Select this via config:
+  Select this at call time:
 
-      RLM.Config.load(llm_module: RLM.LLM.Anthropic)
+      RLM.run(context, query, llm_module: RLM.LLM.Anthropic)
   """
 
   @behaviour RLM.LLM
diff --git a/lib/rlm/llm/req_llm.ex b/lib/rlm/llm/req_llm.ex
index ab372ed..1269b91 100644
--- a/lib/rlm/llm/req_llm.ex
+++ b/lib/rlm/llm/req_llm.ex
@@ -3,7 +3,7 @@ defmodule RLM.LLM.ReqLLM do
   Multi-provider LLM backend using the `req_llm` package.
 
   Supports any provider that `req_llm` supports: Anthropic, OpenAI,
-  Ollama (via vLLM), Google Gemini, Groq, and more. Model specs follow
+  Ollama (local models), Google Gemini, Groq, and more. Model specs follow
   the `"provider:model-name"` convention.
 
   For backward compatibility, bare model names without a provider prefix
@@ -28,9 +28,14 @@ defmodule RLM.LLM.ReqLLM do
 
     case ReqLLM.generate_object(model_spec, context, schema, req_opts) do
       {:ok, response} ->
-        content = encode_object(response)
-        usage = extract_usage(response)
-        {:ok, content, usage}
+        case encode_object(response) do
+          {:error, :no_content} ->
+            {:error, "LLM response contained no usable content (no structured object or text)"}
+
+          content when is_binary(content) ->
+            usage = extract_usage(response)
+            {:ok, content, usage}
+        end
 
       {:error, reason} ->
         {:error, format_error(reason)}
@@ -109,15 +114,21 @@ defmodule RLM.LLM.ReqLLM do
   # the existing chat/4 contract (Worker expects a JSON string).
   defp encode_object(response) do
     case ReqLLM.Response.object(response) do
-      obj when is_map(obj) -> Jason.encode!(obj)
-      _ -> ReqLLM.Response.text(response) || ""
+      obj when is_map(obj) ->
+        Jason.encode!(obj)
+
+      _ ->
+        case ReqLLM.Response.text(response) do
+          text when is_binary(text) and text != "" -> text
+          _ -> {:error, :no_content}
+        end
     end
   end
 
   defp extract_usage(response) do
     raw = ReqLLM.Response.usage(response) || %{}
 
-    %{
+    usage = %{
       prompt_tokens: Map.get(raw, :input_tokens),
       completion_tokens: Map.get(raw, :output_tokens),
       total_tokens: Map.get(raw, :total_tokens),
@@ -128,6 +139,16 @@ defmodule RLM.LLM.ReqLLM do
       cache_read_input_tokens:
         Map.get(raw, :cache_read_input_tokens) || Map.get(raw, :cached_tokens)
     }
+
+    if raw != %{} and is_nil(usage.prompt_tokens) and is_nil(usage.completion_tokens) do
+      require Logger
+
+      Logger.warning(
+        "Could not extract token usage from LLM response. Raw usage keys: #{inspect(Map.keys(raw))}"
+      )
+    end
+
+    usage
   end
 
   defp format_error(%{__exception__: true} = error), do: Exception.message(error)
diff --git a/lib/rlm/replay.ex b/lib/rlm/replay.ex
index 053453e..69f6130 100644
--- a/lib/rlm/replay.ex
+++ b/lib/rlm/replay.ex
@@ -15,7 +15,7 @@ defmodule RLM.Replay do
       - `:live` — switch to live LLM calls for remaining iterations
     * `:config` — config overrides applied to the replay run. When using
       `fallback: :live`, set `llm_module` here to control which module
-      handles the live calls (defaults to `RLM.LLM`).
+      handles the live calls (defaults to `RLM.LLM.ReqLLM`).
   """
 
   @spec replay(String.t(), keyword()) :: {:ok, any(), String.t()} | {:error, any()}
diff --git a/lib/rlm/replay/fallback_llm.ex b/lib/rlm/replay/fallback_llm.ex
index aa0f80b..664763c 100644
--- a/lib/rlm/replay/fallback_llm.ex
+++ b/lib/rlm/replay/fallback_llm.ex
@@ -9,11 +9,18 @@ defmodule RLM.Replay.FallbackLLM do
 
   @behaviour RLM.LLM
 
+  require Logger
+
   @impl true
   def chat(messages, model, config, opts \\ []) do
     case pop_entry() do
       nil ->
         fallback_module = Process.get(:rlm_replay_fallback_module, RLM.LLM.ReqLLM)
+
+        Logger.info(
+          "Replay tape exhausted, falling back to live LLM (#{inspect(fallback_module)})"
+        )
+
         fallback_module.chat(messages, model, config, opts)
 
       entry ->
diff --git a/lib/rlm/replay/tape.ex b/lib/rlm/replay/tape.ex
index 7e21333..53dba28 100644
--- a/lib/rlm/replay/tape.ex
+++ b/lib/rlm/replay/tape.ex
@@ -52,14 +52,26 @@ defmodule RLM.Replay.Tape do
     end
   end
 
-  # EventLog.get_events/1 raises an exit when no Agent exists for the run_id.
-  # Catch that and fall back to TraceStore.
+  # EventLog.get_events/1 raises an exit when no Agent exists for the run_id
+  # (e.g., swept by the GC). Catch that and fall back to TraceStore.
   defp get_events(run_id) do
     case RLM.EventLog.get_events(run_id) do
       [] -> RLM.EventLog.get_events_from_store(run_id)
       events -> events
     end
   catch
-    :exit, _ -> RLM.EventLog.get_events_from_store(run_id)
+    :exit, {:noproc, _} ->
+      # Agent was swept — expected, fall back to persisted store
+      RLM.EventLog.get_events_from_store(run_id)
+
+    :exit, reason ->
+      require Logger
+
+      Logger.warning(
+        "EventLog.get_events failed for run #{run_id}: #{inspect(reason)}, " <>
+          "falling back to TraceStore"
+      )
+
+      RLM.EventLog.get_events_from_store(run_id)
   end
 end
diff --git a/lib/rlm/worker.ex b/lib/rlm/worker.ex
index fb90b2f..7c16214 100644
--- a/lib/rlm/worker.ex
+++ b/lib/rlm/worker.ex
@@ -16,7 +16,7 @@ defmodule RLM.Worker do
   ## Structured Output
 
   LLM responses are JSON objects with `reasoning` and `code` fields,
-  constrained via Claude's `output_config` JSON schema. Feedback messages
+  constrained via a JSON schema (provider-specific structured output). Feedback messages
   after eval are also structured JSON.
   """
   use GenServer, restart: :temporary
diff --git a/test/rlm/config_test.exs b/test/rlm/config_test.exs
new file mode 100644
index 0000000..196598d
--- /dev/null
+++ b/test/rlm/config_test.exs
@@ -0,0 +1,125 @@
+defmodule RLM.ConfigTest do
+  use ExUnit.Case, async: true
+
+  alias RLM.Config
+
+  describe "load/1" do
+    test "returns a Config struct with expected defaults" do
+      config = Config.load()
+
+      assert %Config{} = config
+      assert config.llm_module == RLM.LLM.ReqLLM
+      assert config.max_iterations == 25
+      assert config.max_depth == 5
+    end
+
+    test "builds default models map from model_large/model_small" do
+      config = Config.load()
+
+      assert config.models == %{
+               large: "claude-sonnet-4-6",
+               small: "claude-haiku-4-5"
+             }
+    end
+
+    test "overrides models map when provided" do
+      custom_models = %{large: "ollama:llama3", small: "ollama:llama3:8b"}
+      config = Config.load(models: custom_models)
+
+      assert config.models == custom_models
+    end
+
+    test "legacy model_large/model_small flow into default models map" do
+      config = Config.load(model_large: "custom-large", model_small: "custom-small")
+
+      assert config.models == %{large: "custom-large", small: "custom-small"}
+      assert config.model_large == "custom-large"
+      assert config.model_small == "custom-small"
+    end
+
+    test "explicit models override takes precedence over legacy fields" do
+      config =
+        Config.load(
+          model_large: "ignored-large",
+          models: %{large: "winner-large", small: "winner-small"}
+        )
+
+      assert config.models == %{large: "winner-large", small: "winner-small"}
+    end
+
+    test "llm_module defaults to RLM.LLM.ReqLLM" do
+      config = Config.load()
+      assert config.llm_module == RLM.LLM.ReqLLM
+    end
+
+    test "llm_module can be overridden" do
+      config = Config.load(llm_module: RLM.LLM.Anthropic)
+      assert config.llm_module == RLM.LLM.Anthropic
+    end
+  end
+
+  describe "resolve_model/2" do
+    test "returns {:ok, spec} for a valid key" do
+      config = Config.load(models: %{large: "anthropic:claude-sonnet-4-6"})
+
+      assert {:ok, "anthropic:claude-sonnet-4-6"} = Config.resolve_model(config, :large)
+    end
+
+    test "returns {:ok, spec} for bare model name" do
+      config = Config.load()
+
+      assert {:ok, "claude-sonnet-4-6"} = Config.resolve_model(config, :large)
+      assert {:ok, "claude-haiku-4-5"} = Config.resolve_model(config, :small)
+    end
+
+    test "returns {:error, _} for unknown key" do
+      config = Config.load()
+
+      assert {:error, message} = Config.resolve_model(config, :unknown)
+      assert message =~ "Unknown model key: unknown"
+    end
+
+    test "returns {:error, _} for non-string value in models map" do
+      config = Config.load(models: %{large: 42, small: "valid"})
+
+      assert {:error, message} = Config.resolve_model(config, :large)
+      assert message =~ "invalid spec"
+      assert message =~ "42"
+    end
+
+    test "works with custom model keys" do
+      config = Config.load(models: %{large: "a", small: "b", medium: "ollama:qwen3:14b"})
+
+      assert {:ok, "ollama:qwen3:14b"} = Config.resolve_model(config, :medium)
+    end
+  end
+
+  describe "context_window_for/2" do
+    test "returns context_window_tokens_large for :large" do
+      config = Config.load(context_window_tokens_large: 200_000)
+
+      assert Config.context_window_for(config, :large) == 200_000
+    end
+
+    test "returns context_window_tokens_small for :small" do
+      config = Config.load(context_window_tokens_small: 100_000)
+
+      assert Config.context_window_for(config, :small) == 100_000
+    end
+
+    test "returns default 128_000 for unknown model keys" do
+      config = Config.load()
+
+      import ExUnit.CaptureLog
+      log = capture_log(fn -> assert Config.context_window_for(config, :medium) == 128_000 end)
+      assert log =~ "No context window configured for model key :medium"
+    end
+  end
+
+  describe "resolve_api_key (via load)" do
+    test "api_key can be explicitly set" do
+      config = Config.load(api_key: "test-key-123")
+      assert config.api_key == "test-key-123"
+    end
+  end
+end