From 23dd0a12844c456f76be6273691049ea4896015d Mon Sep 17 00:00:00 2001 From: errantsky Date: Thu, 5 Mar 2026 21:44:56 -0800 Subject: [PATCH 1/3] feat: add multi-provider LLM support via req_llm MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the monolithic RLM.LLM with a behaviour + two implementations: - RLM.LLM.ReqLLM (new default) — delegates to req_llm v1.6, supports Anthropic, OpenAI, Ollama, Gemini, Groq via "provider:model" specs - RLM.LLM.Anthropic — preserved legacy hand-rolled Anthropic client Add a named model map (config.models) with Config.resolve_model/2 and Config.context_window_for/2. Workers use model_key atoms (:large, :small, or custom) instead of inline config.model_large/model_small lookups. API key resolution now checks ANTHROPIC_API_KEY first, falls back to CLAUDE_API_KEY. All 162 tests pass unchanged (MockLLM unaffected). Co-Authored-By: Claude Opus 4.6 --- CHANGELOG.md | 33 ++++++++++ CLAUDE.md | 46 +++++++++---- README.md | 19 ++++-- config/runtime.exs | 9 ++- lib/rlm.ex | 10 +-- lib/rlm/config.ex | 98 ++++++++++++++++++++++++---- lib/rlm/iex.ex | 2 +- lib/rlm/llm.ex | 136 +++++++++------------------------------ lib/rlm/llm/anthropic.ex | 112 ++++++++++++++++++++++++++++++++ lib/rlm/llm/req_llm.ex | 130 +++++++++++++++++++++++++++++++++++++ lib/rlm/replay.ex | 2 +- lib/rlm/worker.ex | 30 +++++---- mix.exs | 3 +- mix.lock | 15 +++++ 14 files changed, 480 insertions(+), 165 deletions(-) create mode 100644 lib/rlm/llm/anthropic.ex create mode 100644 lib/rlm/llm/req_llm.ex diff --git a/CHANGELOG.md b/CHANGELOG.md index e09713c..f3d9af5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,39 @@ All notable changes to this project are documented here. ### Added +**Multi-provider LLM support via req_llm** + +- `RLM.LLM.ReqLLM` — new default LLM backend that delegates to `req_llm` v1.6, + supporting Anthropic, OpenAI, Ollama (local models), Google Gemini, Groq, and any + other provider that `req_llm` supports. Model specs use the `"provider:model-name"` + convention (e.g., `"anthropic:claude-sonnet-4-6"`, `"ollama:qwen3.5:35b"`). Bare + model names without a provider prefix are treated as Anthropic for backward compat. +- `RLM.LLM.Anthropic` — the previous hand-rolled Anthropic Messages API client, + preserved as a fallback for users who need direct Anthropic-specific control. + Select via `llm_module: RLM.LLM.Anthropic`. +- `RLM.LLM` refactored to a pure behaviour module + shared utilities + (`extract_structured/1`, `response_schema/0`); no longer contains an implementation. +- `models` config field — `%{atom() => String.t()}` map of symbolic keys to + provider-prefixed model specs. Default: `%{large: "anthropic:claude-sonnet-4-6", + small: "anthropic:claude-haiku-4-5"}`. Pass custom maps for Ollama/OpenAI: + `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}` +- `RLM.Config.resolve_model/2` — looks up a model key in the `models` map +- `RLM.Config.context_window_for/2` — resolves context window size for a model key + (legacy fields for `:large`/`:small`, default 128k for custom keys) +- `model_key` option on Workers — replaces inline `config.model_large`/`config.model_small` + lookups with named model map resolution + +### Changed + +- Default `llm_module` changed from `RLM.LLM` (which was the implementation) to + `RLM.LLM.ReqLLM` (the new multi-provider adapter) +- API key resolution now checks `ANTHROPIC_API_KEY` first, falls back to `CLAUDE_API_KEY` +- `RLM.Worker` uses `model_key` (`:large`, `:small`, or custom atom) to resolve model + specs via `Config.resolve_model/2` instead of reading `config.model_large`/`model_small` +- `RLM.run/3`, `RLM.run_async/3`, `RLM.start_session/1`, `RLM.Replay.replay/2` pass + `model_key:` instead of `model:` in worker opts +- `req_llm` (`~> 1.6`) added as a dependency + **Deterministic replay** - `RLM.Replay` — replay orchestrator that re-executes a previously recorded run using diff --git a/CLAUDE.md b/CLAUDE.md index 467c03d..dba0805 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -13,7 +13,10 @@ rlm/ │ │ ├── run.ex # Per-run coordinator GenServer │ │ ├── worker.ex # RLM GenServer (iterate loop + keep_alive) │ │ ├── eval.ex # Sandboxed Code.eval_string -│ │ ├── llm.ex # Anthropic Messages API client +│ │ ├── llm.ex # LLM behaviour + shared utilities +│ │ ├── llm/ +│ │ │ ├── req_llm.ex # Multi-provider backend via req_llm (default) +│ │ │ └── anthropic.ex # Direct Anthropic API client (legacy fallback) │ │ ├── helpers.ex # chunks/2, grep/2, preview/2, list_bindings/0 │ │ ├── sandbox.ex # Eval sandbox: helpers + LLM calls + tool wrappers │ │ ├── prompt.ex # System prompt + message formatting @@ -162,17 +165,29 @@ retrieve the execution trace via `RLM.EventLog`. On failure it returns `{:error, A `Process.monitor` on the Worker ensures crashes surface as errors rather than hangs. ### LLM Client -Uses the Anthropic Messages API (not OpenAI format). System messages are -extracted and sent as the top-level `system` field. Requires `CLAUDE_API_KEY` env var. +The default backend is `RLM.LLM.ReqLLM`, which delegates to the `req_llm` package +and supports any provider: Anthropic, OpenAI, Ollama (local), Gemini, Groq, etc. +Model specs use the `"provider:model-name"` convention (e.g., `"anthropic:claude-sonnet-4-6"`, +`"ollama:qwen3.5:35b"`). Bare names without a prefix are treated as Anthropic for +backward compatibility. Requires `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY` as fallback). -LLM responses use structured output (`output_config` with `json_schema`) to constrain -responses to `{"reasoning": "...", "code": "..."}` JSON objects. This eliminates regex-based -code extraction and provides clean separation of reasoning from executable code. Feedback -messages after eval are also structured JSON. +The legacy hand-rolled Anthropic client is preserved as `RLM.LLM.Anthropic` and can +be selected via `llm_module: RLM.LLM.Anthropic`. + +LLM responses use structured output (JSON schema) to constrain responses to +`{"reasoning": "...", "code": "..."}` objects. Feedback messages after eval are also +structured JSON. + +The `models` config field maps symbolic keys to provider-prefixed specs: + +```elixir +RLM.run(context, query, + models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}) +``` Default models: -- Large: `claude-sonnet-4-6` -- Small: `claude-haiku-4-5` +- Large: `anthropic:claude-sonnet-4-6` +- Small: `anthropic:claude-haiku-4-5` ## Module Map @@ -186,7 +201,9 @@ Default models: | `RLM.Worker` | GenServer per execution node; iterate loop + keep_alive mode; delegates spawning to Run | | `RLM.Eval` | Sandboxed `Code.eval_string` with async IO capture + cwd injection | | `RLM.Sandbox` | Functions injected into eval'd code (helpers + LLM calls + tool wrappers) | -| `RLM.LLM` | Anthropic Messages API client with structured output (`extract_structured/1`) | +| `RLM.LLM` | LLM behaviour + shared utilities (`extract_structured/1`, `response_schema/0`) | +| `RLM.LLM.ReqLLM` | Multi-provider LLM backend via `req_llm` (default) | +| `RLM.LLM.Anthropic` | Direct Anthropic Messages API client (legacy fallback) | | `RLM.Prompt` | System prompt loading + structured JSON feedback message formatting | | `RLM.Helpers` | `chunks/2`, `grep/2`, `preview/2`, `list_bindings/0` | | `RLM.Truncate` | Head+tail string truncation for stdout overflow | @@ -246,9 +263,10 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`. | Field | Default | Notes | |---|---|---| | `api_base_url` | `"https://api.anthropic.com"` | Anthropic API base URL | -| `api_key` | `CLAUDE_API_KEY` env var | API key for LLM requests | -| `model_large` | `claude-sonnet-4-6` | Used for parent workers | -| `model_small` | `claude-haiku-4-5` | Used for subcalls | +| `api_key` | `ANTHROPIC_API_KEY` env var | API key for LLM requests (falls back to `CLAUDE_API_KEY`) | +| `models` | `%{large: "anthropic:claude-sonnet-4-6", small: "anthropic:claude-haiku-4-5"}` | Named model map; keys are atoms, values are `"provider:model"` specs | +| `model_large` | `claude-sonnet-4-6` | Legacy; used to build default `models` map | +| `model_small` | `claude-haiku-4-5` | Legacy; used to build default `models` map | | `max_iterations` | `25` | Per-worker LLM turn limit | | `max_depth` | `5` | Recursive subcall depth limit | | `max_concurrent_subcalls` | `10` | Parallel subcall limit per worker | @@ -267,7 +285,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`. | `enable_event_log` | `true` | Enable per-run EventLog trace agents | | `event_log_capture_full_stdout` | `false` | Store full stdout in traces (vs truncated) | | `enable_replay_recording` | `false` | Record full LLM responses for deterministic replay | -| `llm_module` | `RLM.LLM` | Swappable for `RLM.Test.MockLLM` | +| `llm_module` | `RLM.LLM.ReqLLM` | Default LLM backend; swap to `RLM.LLM.Anthropic` or `RLM.Test.MockLLM` | ## Testing Conventions diff --git a/README.md b/README.md index 8a98374..cc3e274 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,9 @@ I wanted to take further, and the design philosophy behind [pi](https://github.com/badlogic/pi-mono/) — a coding agent that keeps things simple and transparent. This is very much a learning project, but it works and it's been fun to build. -A single Phoenix application: an AI execution engine where Claude writes Elixir code that +A single Phoenix application: an AI execution engine where LLMs write Elixir code that runs in a persistent REPL, with recursive sub-agent spawning and built-in filesystem tools. +Supports multiple LLM providers via `req_llm`: Anthropic, OpenAI, Ollama (local), Gemini, and more. **One engine, two modes:** 1. **One-shot** — `RLM.run/3` processes data and returns a result @@ -81,7 +82,7 @@ Three invariants the engine enforces: Requires Elixir ≥ 1.19 / OTP 27 and an [Anthropic API key](https://console.anthropic.com/). ```bash -export CLAUDE_API_KEY=sk-ant-... +export ANTHROPIC_API_KEY=sk-ant-... # or CLAUDE_API_KEY as fallback mix deps.get && mix compile mix test # excludes live API tests iex -S mix # interactive shell @@ -136,12 +137,18 @@ watch(session) # attach a live telemetry stream ### Configuration overrides ```elixir +# Use custom Anthropic models {:ok, result, run_id} = RLM.run(context, query, max_iterations: 10, max_depth: 3, - model_large: "claude-opus-4-6", + models: %{large: "anthropic:claude-opus-4-6", small: "anthropic:claude-haiku-4-5"}, eval_timeout: 60_000 ) + +# Use local Ollama models (no API key needed) +{:ok, result, run_id} = RLM.run(context, query, + models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"} +) ``` ### Deterministic replay @@ -356,9 +363,9 @@ RLM_COOKIE=secret # shared secret for node authentication RLM executes LLM-generated Elixir code via `Code.eval_string` with full access to the host filesystem, network, and shell. **Do not expose RLM to untrusted users or untrusted -LLM providers.** It is designed for local development, trusted API backends (Anthropic), -and controlled environments. There is no sandboxing beyond process-level isolation and -configurable timeouts. +LLM providers.** It is designed for local development, trusted API backends (Anthropic, +OpenAI, local Ollama), and controlled environments. There is no sandboxing beyond +process-level isolation and configurable timeouts. --- diff --git a/config/runtime.exs b/config/runtime.exs index 76d8b37..90a44b2 100644 --- a/config/runtime.exs +++ b/config/runtime.exs @@ -35,12 +35,15 @@ if config_env() == :prod do You can generate one by calling: mix phx.gen.secret """ - # CLAUDE_API_KEY is required in prod for LLM calls. - System.get_env("CLAUDE_API_KEY") || + # An API key is required in prod for LLM calls. + # ANTHROPIC_API_KEY is preferred; CLAUDE_API_KEY is accepted as a fallback. + unless System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY") do raise """ - environment variable CLAUDE_API_KEY is missing. + environment variable ANTHROPIC_API_KEY is missing. Set it to your Anthropic API key to enable LLM functionality. + (CLAUDE_API_KEY is also accepted as a fallback.) """ + end host = System.get_env("PHX_HOST") || "example.com" diff --git a/lib/rlm.ex b/lib/rlm.ex index a061618..cd6cbb8 100644 --- a/lib/rlm.ex +++ b/lib/rlm.ex @@ -59,7 +59,7 @@ defmodule RLM do query: query, config: config, depth: 0, - model: config.model_large, + model_key: :large, caller: self() ] @@ -116,7 +116,7 @@ defmodule RLM do query: query, config: config, depth: 0, - model: config.model_large, + model_key: :large, caller: self() ] @@ -146,7 +146,7 @@ defmodule RLM do Options: - `:cwd` — working directory for tools (default: current dir) - - `:model` — override the model (default: config.model_large) + - `:model_key` — model key from config.models map (default: `:large`) - Plus any `RLM.Config` overrides Returns `{:ok, session_id}`. @@ -157,7 +157,7 @@ defmodule RLM do session_id = RLM.Span.generate_id() run_id = RLM.Span.generate_run_id() cwd = Keyword.get(opts, :cwd, File.cwd!()) - model = Keyword.get(opts, :model, config.model_large) + model_key = Keyword.get(opts, :model_key, :large) run_opts = [run_id: run_id, config: config, keep_alive: true] @@ -169,7 +169,7 @@ defmodule RLM do config: config, keep_alive: true, cwd: cwd, - model: model + model_key: model_key ] case RLM.Run.start_worker(run_pid, worker_opts) do diff --git a/lib/rlm/config.ex b/lib/rlm/config.ex index e627717..fe268ff 100644 --- a/lib/rlm/config.ex +++ b/lib/rlm/config.ex @@ -2,13 +2,42 @@ defmodule RLM.Config do @moduledoc """ Configuration struct for RLM engine. Loads defaults from application env, allows runtime overrides. + + ## Multi-Provider Model Map + + The `models` field maps symbolic keys to provider-prefixed model specs: + + %RLM.Config{ + models: %{ + large: "anthropic:claude-sonnet-4-6", + small: "anthropic:claude-haiku-4-5" + } + } + + Model specs follow the `req_llm` naming convention: `"provider:model-name"`. + For backward compatibility, bare model names without a provider prefix + are treated as Anthropic models. + + ## Supported Providers + + Any provider supported by `req_llm`: Anthropic, OpenAI, Ollama (via vLLM), + Google Gemini, Groq, and more. For local Ollama: + + RLM.run("data", "query", + models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}) """ + require Logger + + @default_context_window 128_000 + defstruct [ :api_base_url, :api_key, + # Legacy model fields — prefer `models` map :model_large, :model_small, + :models, :max_iterations, :max_depth, :max_concurrent_subcalls, @@ -19,10 +48,6 @@ defmodule RLM.Config do :eval_timeout, :llm_timeout, :subcall_timeout, - :cost_per_1k_prompt_tokens_large, - :cost_per_1k_prompt_tokens_small, - :cost_per_1k_completion_tokens_large, - :cost_per_1k_completion_tokens_small, :enable_otel, :enable_event_log, :event_log_capture_full_stdout, @@ -34,11 +59,20 @@ defmodule RLM.Config do @spec load(keyword()) :: t() def load(overrides \\ []) do + model_large = get(overrides, :model_large, "claude-sonnet-4-6") + model_small = get(overrides, :model_small, "claude-haiku-4-5") + + default_models = %{ + large: model_large, + small: model_small + } + %__MODULE__{ api_base_url: get(overrides, :api_base_url, "https://api.anthropic.com"), - api_key: get(overrides, :api_key, System.get_env("CLAUDE_API_KEY")), - model_large: get(overrides, :model_large, "claude-sonnet-4-6"), - model_small: get(overrides, :model_small, "claude-haiku-4-5"), + api_key: get(overrides, :api_key, resolve_api_key()), + model_large: model_large, + model_small: model_small, + models: get(overrides, :models, default_models), max_iterations: get(overrides, :max_iterations, 25), max_depth: get(overrides, :max_depth, 5), max_concurrent_subcalls: get(overrides, :max_concurrent_subcalls, 10), @@ -49,20 +83,56 @@ defmodule RLM.Config do eval_timeout: get(overrides, :eval_timeout, 300_000), llm_timeout: get(overrides, :llm_timeout, 120_000), subcall_timeout: get(overrides, :subcall_timeout, 600_000), - cost_per_1k_prompt_tokens_large: get(overrides, :cost_per_1k_prompt_tokens_large, 0.003), - cost_per_1k_prompt_tokens_small: get(overrides, :cost_per_1k_prompt_tokens_small, 0.0008), - cost_per_1k_completion_tokens_large: - get(overrides, :cost_per_1k_completion_tokens_large, 0.015), - cost_per_1k_completion_tokens_small: - get(overrides, :cost_per_1k_completion_tokens_small, 0.004), enable_otel: get(overrides, :enable_otel, false), enable_event_log: get(overrides, :enable_event_log, true), event_log_capture_full_stdout: get(overrides, :event_log_capture_full_stdout, false), enable_replay_recording: get(overrides, :enable_replay_recording, false), - llm_module: get(overrides, :llm_module, RLM.LLM) + llm_module: get(overrides, :llm_module, RLM.LLM.ReqLLM) } end + @doc """ + Resolve a model key to its spec string. + + Returns `{:ok, spec}` or `{:error, reason}`. + + ## Examples + + iex> config = RLM.Config.load(models: %{large: "anthropic:claude-sonnet-4-6"}) + iex> RLM.Config.resolve_model(config, :large) + {:ok, "anthropic:claude-sonnet-4-6"} + + iex> config = RLM.Config.load() + iex> RLM.Config.resolve_model(config, :unknown) + {:error, "Unknown model key: unknown"} + """ + @spec resolve_model(t(), atom()) :: {:ok, String.t()} | {:error, String.t()} + def resolve_model(%__MODULE__{models: models}, key) when is_atom(key) do + case Map.fetch(models, key) do + {:ok, spec} when is_binary(spec) -> {:ok, spec} + :error -> {:error, "Unknown model key: #{key}"} + end + end + + @doc """ + Look up the context window size for a model key. + + Uses a two-tier strategy: + 1. Legacy `context_window_tokens_large/small` fields (for `:large`/`:small` keys) + 2. Default of #{@default_context_window} tokens for unknown models + """ + @spec context_window_for(t(), atom()) :: non_neg_integer() + def context_window_for(%__MODULE__{} = config, :large), do: config.context_window_tokens_large + def context_window_for(%__MODULE__{} = config, :small), do: config.context_window_tokens_small + def context_window_for(%__MODULE__{}, _key), do: @default_context_window + + defp resolve_api_key do + case System.get_env("ANTHROPIC_API_KEY") do + nil -> System.get_env("CLAUDE_API_KEY") + key -> key + end + end + defp get(overrides, key, default) do case Keyword.fetch(overrides, key) do {:ok, value} -> value diff --git a/lib/rlm/iex.ex b/lib/rlm/iex.ex index cf87456..b36161d 100644 --- a/lib/rlm/iex.ex +++ b/lib/rlm/iex.ex @@ -26,7 +26,7 @@ defmodule RLM.IEx do Start a new interactive session. Returns the `session_id`. Options: - - `:model` — override the model (default: config.model_large) + - `:model_key` — model key from config.models map (default: `:large`) - `:cwd` — working directory for tools (default: current dir) """ @spec start(keyword()) :: String.t() diff --git a/lib/rlm/llm.ex b/lib/rlm/llm.ex index 9dd69da..0b6e920 100644 --- a/lib/rlm/llm.ex +++ b/lib/rlm/llm.ex @@ -1,11 +1,18 @@ defmodule RLM.LLM do @moduledoc """ - Claude (Anthropic) Messages API client. - Returns response content alongside token usage metadata. + Behaviour for LLM backends and shared utilities for structured output parsing. - Uses structured output (`output_config` with JSON schema) to constrain - LLM responses to a `{"reasoning", "code"}` JSON object, eliminating - regex-based code extraction. + The default implementation is `RLM.LLM.ReqLLM` which supports multiple + providers via the `req_llm` package. The legacy hand-rolled Anthropic + client is available as `RLM.LLM.Anthropic`. + + ## Implementations + + - `RLM.LLM.ReqLLM` — multi-provider backend (default) + - `RLM.LLM.Anthropic` — direct Anthropic Messages API client + - `RLM.Test.MockLLM` — deterministic test mock (ETS-based) + - `RLM.Replay.LLM` — replay from recorded tape + - `RLM.Replay.FallbackLLM` — replay with live fallback """ @type usage :: %{ @@ -16,6 +23,24 @@ defmodule RLM.LLM do cache_read_input_tokens: non_neg_integer() | nil } + @doc """ + Send a chat request to the LLM. + + ## Arguments + + * `messages` — list of message maps with `:role` and `:content` fields + * `model` — provider-prefixed model spec (e.g., `"anthropic:claude-sonnet-4-6"`) + * `config` — `RLM.Config.t()` struct + * `opts` — keyword list; supports `:schema` for structured output + + ## Returns + + * `{:ok, json_string, usage}` on success + * `{:error, reason}` on failure + """ + @callback chat([map()], String.t(), RLM.Config.t(), keyword()) :: + {:ok, String.t(), usage()} | {:error, String.t()} + @response_schema %{ "type" => "object", "properties" => %{ @@ -30,68 +55,6 @@ defmodule RLM.LLM do @spec response_schema() :: map() def response_schema, do: @response_schema - @callback chat([map()], String.t(), RLM.Config.t(), keyword()) :: - {:ok, String.t(), usage()} | {:error, String.t()} - - @spec chat([map()], String.t(), RLM.Config.t(), keyword()) :: - {:ok, String.t(), usage()} | {:error, String.t()} - def chat(messages, model, config, opts \\ []) do - url = String.trim_trailing(config.api_base_url, "/") <> "/v1/messages" - - {system_text, user_messages} = extract_system(messages) - - headers = [ - {"x-api-key", config.api_key || ""}, - {"anthropic-version", "2023-06-01"}, - {"content-type", "application/json"} - ] - - schema = Keyword.get(opts, :schema, @response_schema) - - body = %{ - model: model, - max_tokens: 4096, - cache_control: %{type: "ephemeral"}, - messages: format_messages(user_messages), - output_config: %{ - format: %{ - type: "json_schema", - schema: schema - } - } - } - - body = if system_text, do: Map.put(body, :system, system_text), else: body - - case Req.post(url, - json: body, - headers: headers, - receive_timeout: config.llm_timeout - ) do - {:ok, %{status: 200, body: resp_body}} -> - content = extract_content(resp_body) - usage = extract_usage(resp_body) - - if content do - {:ok, content, usage} - else - {:error, "No content in API response"} - end - - {:ok, %{status: status, body: resp_body}} -> - error_msg = - case resp_body do - %{"error" => %{"message" => msg}} -> msg - _ -> "HTTP #{status}" - end - - {:error, "API error: #{error_msg}"} - - {:error, reason} -> - {:error, "API request failed: #{inspect(reason)}"} - end - end - @doc """ Parse a structured JSON response from the LLM. @@ -113,43 +76,4 @@ defmodule RLM.LLM do {:error, "JSON parse failed: #{inspect(err)}"} end end - - defp extract_system(messages) do - case Enum.split_with(messages, fn m -> m.role == :system end) do - {[], rest} -> {nil, rest} - {system_msgs, rest} -> {Enum.map_join(system_msgs, "\n", & &1.content), rest} - end - end - - defp format_messages(messages) do - Enum.map(messages, fn msg -> - %{"role" => to_string(msg.role), "content" => msg.content} - end) - end - - defp extract_content(body) do - body - |> Map.get("content", []) - |> Enum.find_value(fn - %{"type" => "text", "text" => text} -> text - _ -> nil - end) - end - - defp extract_usage(body) do - usage = Map.get(body, "usage", %{}) - - input = Map.get(usage, "input_tokens") - output = Map.get(usage, "output_tokens") - cache_creation = Map.get(usage, "cache_creation_input_tokens") - cache_read = Map.get(usage, "cache_read_input_tokens") - - %{ - prompt_tokens: input, - completion_tokens: output, - total_tokens: if(input && output, do: input + output, else: nil), - cache_creation_input_tokens: cache_creation, - cache_read_input_tokens: cache_read - } - end end diff --git a/lib/rlm/llm/anthropic.ex b/lib/rlm/llm/anthropic.ex new file mode 100644 index 0000000..2fe0587 --- /dev/null +++ b/lib/rlm/llm/anthropic.ex @@ -0,0 +1,112 @@ +defmodule RLM.LLM.Anthropic do + @moduledoc """ + Hand-rolled Anthropic Messages API client. + + Preserved as a fallback for users who need direct control over + Anthropic-specific features (prompt caching, etc.). The default + backend is `RLM.LLM.ReqLLM`. + + Select this via config: + + RLM.Config.load(llm_module: RLM.LLM.Anthropic) + """ + + @behaviour RLM.LLM + + @impl true + def chat(messages, model, config, opts \\ []) do + url = String.trim_trailing(config.api_base_url, "/") <> "/v1/messages" + + {system_text, user_messages} = extract_system(messages) + + headers = [ + {"x-api-key", config.api_key || ""}, + {"anthropic-version", "2023-06-01"}, + {"content-type", "application/json"} + ] + + schema = Keyword.get(opts, :schema, RLM.LLM.response_schema()) + + body = %{ + model: model, + max_tokens: 4096, + cache_control: %{type: "ephemeral"}, + messages: format_messages(user_messages), + output_config: %{ + format: %{ + type: "json_schema", + schema: schema + } + } + } + + body = if system_text, do: Map.put(body, :system, system_text), else: body + + case Req.post(url, + json: body, + headers: headers, + receive_timeout: config.llm_timeout + ) do + {:ok, %{status: 200, body: resp_body}} -> + content = extract_content(resp_body) + usage = extract_usage(resp_body) + + if content do + {:ok, content, usage} + else + {:error, "No content in API response"} + end + + {:ok, %{status: status, body: resp_body}} -> + error_msg = + case resp_body do + %{"error" => %{"message" => msg}} -> msg + _ -> "HTTP #{status}" + end + + {:error, "API error: #{error_msg}"} + + {:error, reason} -> + {:error, "API request failed: #{inspect(reason)}"} + end + end + + defp extract_system(messages) do + case Enum.split_with(messages, fn m -> m.role == :system end) do + {[], rest} -> {nil, rest} + {system_msgs, rest} -> {Enum.map_join(system_msgs, "\n", & &1.content), rest} + end + end + + defp format_messages(messages) do + Enum.map(messages, fn msg -> + %{"role" => to_string(msg.role), "content" => msg.content} + end) + end + + defp extract_content(body) do + body + |> Map.get("content", []) + |> Enum.find_value(fn + %{"type" => "text", "text" => text} -> text + _ -> nil + end) + end + + defp extract_usage(body) do + usage = Map.get(body, "usage", %{}) + + input = Map.get(usage, "input_tokens") + output = Map.get(usage, "output_tokens") + cache_creation = Map.get(usage, "cache_creation_input_tokens") + cache_read = Map.get(usage, "cache_read_input_tokens") + + %{ + prompt_tokens: input, + completion_tokens: output, + total_tokens: if(input && output, do: input + output, else: nil), + cache_creation_input_tokens: cache_creation, + cache_read_input_tokens: cache_read + } + end +end diff --git a/lib/rlm/llm/req_llm.ex b/lib/rlm/llm/req_llm.ex new file mode 100644 index 0000000..a5866e0 --- /dev/null +++ b/lib/rlm/llm/req_llm.ex @@ -0,0 +1,130 @@ +defmodule RLM.LLM.ReqLLM do + @moduledoc """ + Multi-provider LLM backend using the `req_llm` package. + + Supports any provider that `req_llm` supports: Anthropic, OpenAI, + Ollama (via vLLM), Google Gemini, Groq, and more. Model specs follow + the `"provider:model-name"` convention. + + For backward compatibility, bare model names without a provider prefix + are treated as Anthropic models (e.g., `"claude-sonnet-4-6"` becomes + `"anthropic:claude-sonnet-4-6"`). + + ## Provider-Specific Features + + - **Anthropic**: Prompt caching enabled automatically via `anthropic_prompt_cache: true` + - **Ollama**: No API key needed; uses `http://localhost:11434` by default + - **OpenAI**: Reads `OPENAI_API_KEY` from env + """ + + @behaviour RLM.LLM + + @impl true + def chat(messages, model, config, opts \\ []) do + model_spec = normalize_model_spec(model) + schema = Keyword.get(opts, :schema, RLM.LLM.response_schema()) + context = build_context(messages) + req_opts = build_opts(model_spec, config) + + case ReqLLM.generate_object(model_spec, context, schema, req_opts) do + {:ok, response} -> + content = encode_object(response) + usage = extract_usage(response) + {:ok, content, usage} + + {:error, reason} -> + {:error, format_error(reason)} + end + end + + # If model string already contains ":", it's in provider:model format. + # Otherwise, assume Anthropic for backward compatibility. + defp normalize_model_spec(model) when is_binary(model) do + if String.contains?(model, ":") do + model + else + "anthropic:#{model}" + end + end + + defp build_context(messages) do + {system_msgs, user_msgs} = + Enum.split_with(messages, fn m -> m.role == :system end) + + system_text = + case system_msgs do + [] -> nil + msgs -> Enum.map_join(msgs, "\n", & &1.content) + end + + req_messages = + Enum.map(user_msgs, fn msg -> + case msg.role do + :user -> ReqLLM.Context.user(msg.content) + :assistant -> ReqLLM.Context.assistant(msg.content) + end + end) + + all_messages = + if system_text do + [ReqLLM.Context.system(system_text) | req_messages] + else + req_messages + end + + ReqLLM.Context.new(all_messages) + end + + defp build_opts(model_spec, config) do + base_opts = [ + max_tokens: 4096, + receive_timeout: config.llm_timeout + ] + + base_opts + |> maybe_add_api_key(config) + |> maybe_add_anthropic_opts(model_spec) + end + + defp maybe_add_api_key(opts, config) do + if config.api_key do + Keyword.put(opts, :api_key, config.api_key) + else + opts + end + end + + defp maybe_add_anthropic_opts(opts, model_spec) do + if anthropic_model?(model_spec) do + Keyword.put(opts, :anthropic_prompt_cache, true) + else + opts + end + end + + defp anthropic_model?(spec), do: String.starts_with?(spec, "anthropic:") + + # Re-serialize the parsed object back to a JSON string to preserve + # the existing chat/4 contract (Worker expects a JSON string). + defp encode_object(response) do + case ReqLLM.Response.object(response) do + obj when is_map(obj) -> Jason.encode!(obj) + _ -> ReqLLM.Response.text(response) || "" + end + end + + defp extract_usage(response) do + raw = ReqLLM.Response.usage(response) || %{} + + %{ + prompt_tokens: Map.get(raw, :input_tokens), + completion_tokens: Map.get(raw, :output_tokens), + total_tokens: Map.get(raw, :total_tokens), + cache_creation_input_tokens: Map.get(raw, :cache_creation_input_tokens), + cache_read_input_tokens: Map.get(raw, :cache_read_input_tokens) + } + end + + defp format_error(%{__exception__: true} = error), do: Exception.message(error) + defp format_error(reason), do: "LLM request failed: #{inspect(reason)}" +end diff --git a/lib/rlm/replay.ex b/lib/rlm/replay.ex index 97d8fdf..053453e 100644 --- a/lib/rlm/replay.ex +++ b/lib/rlm/replay.ex @@ -63,7 +63,7 @@ defmodule RLM.Replay do query: query, config: config, depth: 0, - model: config.model_large, + model_key: :large, caller: self(), replay_tape: tape, replay_patches: patches, diff --git a/lib/rlm/worker.ex b/lib/rlm/worker.ex index 74c7ed2..1f1610d 100644 --- a/lib/rlm/worker.ex +++ b/lib/rlm/worker.ex @@ -41,6 +41,7 @@ defmodule RLM.Worker do # Tracks in-flight eval context (includes task_ref for supervised eval) :eval_context, # keep_alive mode fields + :model_key, :keep_alive, :cwd, :pending_from, @@ -77,7 +78,8 @@ defmodule RLM.Worker do context = Keyword.get(opts, :context, "") query = Keyword.get(opts, :query, context) depth = Keyword.get(opts, :depth, 0) - model = Keyword.get(opts, :model, config.model_large) + model_key = Keyword.get(opts, :model_key, :large) + model = Keyword.get_lazy(opts, :model, fn -> resolve_model!(config, model_key) end) parent_span_id = Keyword.get(opts, :parent_span_id) caller = Keyword.get(opts, :caller) keep_alive = Keyword.get(opts, :keep_alive, false) @@ -110,6 +112,7 @@ defmodule RLM.Worker do history: [system_msg], bindings: [final_answer: nil, compacted_history: ""], model: model, + model_key: model_key, config: config, status: :idle, result: nil, @@ -156,6 +159,7 @@ defmodule RLM.Worker do history: [system_msg, user_msg], bindings: bindings, model: model, + model_key: model_key, config: config, status: :running, result: nil, @@ -448,10 +452,7 @@ defmodule RLM.Worker do end def handle_call({:direct_query, text, model_size, schema}, from, state) do - model = - if model_size == :large, - do: state.config.model_large, - else: state.config.model_small + model = resolve_model!(state.config, model_size) if map_size(state.pending_subcalls) >= state.config.max_concurrent_subcalls do {:reply, @@ -497,10 +498,7 @@ defmodule RLM.Worker do end def handle_call({:spawn_subcall, text, model_size}, from, state) do - model = - if model_size == :large, - do: state.config.model_large, - else: state.config.model_small + model = resolve_model!(state.config, model_size) cond do state.depth >= state.config.max_depth -> @@ -526,6 +524,7 @@ defmodule RLM.Worker do context: text, query: text, model: model, + model_key: model_size, config: state.config, depth: state.depth + 1, parent_span_id: state.span_id, @@ -947,11 +946,7 @@ defmodule RLM.Worker do end defp context_window_for_model(state) do - if state.model == state.config.model_large do - state.config.context_window_tokens_large - else - state.config.context_window_tokens_small - end + RLM.Config.context_window_for(state.config, state.model_key) end defp serialize_history(messages) do @@ -963,6 +958,13 @@ defmodule RLM.Worker do defp join_compacted("", new), do: new defp join_compacted(existing, new), do: existing <> "\n===\n" <> new + defp resolve_model!(config, key) do + case RLM.Config.resolve_model(config, key) do + {:ok, spec} -> spec + {:error, _} -> Map.get(config.models, :large, "claude-sonnet-4-6") + end + end + defp emit_telemetry(event, measurements, state, extra_metadata) do base = %{ span_id: state.span_id, diff --git a/mix.exs b/mix.exs index dbb0703..e98a236 100644 --- a/mix.exs +++ b/mix.exs @@ -37,8 +37,9 @@ defmodule RLM.MixProject do defp deps do [ - # HTTP client + # HTTP / LLM client {:req, "~> 0.5"}, + {:req_llm, "~> 1.6"}, {:jason, "~> 1.4"}, # Telemetry diff --git a/mix.lock b/mix.lock index 8628def..cec4ee8 100644 --- a/mix.lock +++ b/mix.lock @@ -1,14 +1,18 @@ %{ + "abnf_parsec": {:hex, :abnf_parsec, "2.1.0", "c4e88d5d089f1698297c0daced12be1fb404e6e577ecf261313ebba5477941f9", [:mix], [{:nimble_parsec, "~> 1.4", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "e0ed6290c7cc7e5020c006d1003520390c9bdd20f7c3f776bd49bfe3c5cd362a"}, "bandit": {:hex, :bandit, "1.10.2", "d15ea32eb853b5b42b965b24221eb045462b2ba9aff9a0bda71157c06338cbff", [:mix], [{:hpax, "~> 1.0", [hex: :hpax, repo: "hexpm", optional: false]}, {:plug, "~> 1.18", [hex: :plug, repo: "hexpm", optional: false]}, {:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}, {:thousand_island, "~> 1.0", [hex: :thousand_island, repo: "hexpm", optional: false]}, {:websock, "~> 0.5", [hex: :websock, repo: "hexpm", optional: false]}], "hexpm", "27b2a61b647914b1726c2ced3601473be5f7aa6bb468564a688646a689b3ee45"}, "boundary": {:hex, :boundary, "0.10.4", "5fec5d2736c12f9bfe1720c3a2bd8c48c3547c24d6002ebf8e087570afd5bd2f", [:mix], [], "hexpm", "8baf6f23987afdb1483033ed0bde75c9c703613c22ed58d5f23bf948f203247c"}, "bunt": {:hex, :bunt, "1.0.0", "081c2c665f086849e6d57900292b3a161727ab40431219529f13c4ddcf3e7a44", [:mix], [], "hexpm", "dc5f86aa08a5f6fa6b8096f0735c4e76d54ae5c9fa2c143e5a1fc7c1cd9bb6b5"}, "cc_precompiler": {:hex, :cc_precompiler, "0.1.11", "8c844d0b9fb98a3edea067f94f616b3f6b29b959b6b3bf25fee94ffe34364768", [:mix], [{:elixir_make, "~> 0.7", [hex: :elixir_make, repo: "hexpm", optional: false]}], "hexpm", "3427232caf0835f94680e5bcf082408a70b48ad68a5f5c0b02a3bea9f3a075b9"}, "circular_buffer": {:hex, :circular_buffer, "1.0.0", "25c004da0cba7bd8bc1bdabded4f9a902d095e20600fd15faf1f2ffbaea18a07", [:mix], [], "hexpm", "c829ec31c13c7bafd1f546677263dff5bfb006e929f25635878ac3cfba8749e5"}, "credo": {:hex, :credo, "1.7.16", "a9f1389d13d19c631cb123c77a813dbf16449a2aebf602f590defa08953309d4", [:mix], [{:bunt, "~> 0.2.1 or ~> 1.0", [hex: :bunt, repo: "hexpm", optional: false]}, {:file_system, "~> 0.2 or ~> 1.0", [hex: :file_system, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "d0562af33756b21f248f066a9119e3890722031b6d199f22e3cf95550e4f1579"}, + "deep_merge": {:hex, :deep_merge, "1.0.0", "b4aa1a0d1acac393bdf38b2291af38cb1d4a52806cf7a4906f718e1feb5ee961", [:mix], [], "hexpm", "ce708e5f094b9cd4e8f2be4f00d2f4250c4095be93f8cd6d018c753894885430"}, "dns_cluster": {:hex, :dns_cluster, "0.2.0", "aa8eb46e3bd0326bd67b84790c561733b25c5ba2fe3c7e36f28e88f384ebcb33", [:mix], [], "hexpm", "ba6f1893411c69c01b9e8e8f772062535a4cf70f3f35bcc964a324078d8c8240"}, + "dotenvy": {:hex, :dotenvy, "1.1.1", "00e318f3c51de9fafc4b48598447e386f19204dc18ca69886905bb8f8b08b667", [:mix], [], "hexpm", "c8269471b5701e9e56dc86509c1199ded2b33dce088c3471afcfef7839766d8e"}, "earmark_parser": {:hex, :earmark_parser, "1.4.44", "f20830dd6b5c77afe2b063777ddbbff09f9759396500cdbe7523efd58d7a339c", [:mix], [], "hexpm", "4778ac752b4701a5599215f7030989c989ffdc4f6df457c5f36938cc2d2a2750"}, "elixir_make": {:hex, :elixir_make, "0.9.0", "6484b3cd8c0cee58f09f05ecaf1a140a8c97670671a6a0e7ab4dc326c3109726", [:mix], [], "hexpm", "db23d4fd8b757462ad02f8aa73431a426fe6671c80b200d9710caf3d1dd0ffdb"}, "esbuild": {:hex, :esbuild, "0.10.0", "b0aa3388a1c23e727c5a3e7427c932d89ee791746b0081bbe56103e9ef3d291f", [:mix], [{:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "468489cda427b974a7cc9f03ace55368a83e1a7be12fba7e30969af78e5f8c70"}, + "ex_aws_auth": {:hex, :ex_aws_auth, "1.3.1", "3963992d6f7cb251b53573603c3615cec70c3f4d86199fdb865ff440295ef7a4", [:mix], [{:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: true]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: true]}], "hexpm", "025793aa08fa419aabdb652db60edbdb2e12346bd447988a1bb5854c4dd64903"}, "ex_doc": {:hex, :ex_doc, "0.40.1", "67542e4b6dde74811cfd580e2c0149b78010fd13001fda7cfeb2b2c2ffb1344d", [:mix], [{:earmark_parser, "~> 1.4.44", [hex: :earmark_parser, repo: "hexpm", optional: false]}, {:makeup_c, ">= 0.1.0", [hex: :makeup_c, repo: "hexpm", optional: true]}, {:makeup_elixir, "~> 0.14 or ~> 1.0", [hex: :makeup_elixir, repo: "hexpm", optional: false]}, {:makeup_erlang, "~> 0.1 or ~> 1.0", [hex: :makeup_erlang, repo: "hexpm", optional: false]}, {:makeup_html, ">= 0.1.0", [hex: :makeup_html, repo: "hexpm", optional: true]}], "hexpm", "bcef0e2d360d93ac19f01a85d58f91752d930c0a30e2681145feea6bd3516e00"}, "expo": {:hex, :expo, "1.1.1", "4202e1d2ca6e2b3b63e02f69cfe0a404f77702b041d02b58597c00992b601db5", [:mix], [], "hexpm", "5fb308b9cb359ae200b7e23d37c76978673aa1b06e2b3075d814ce12c5811640"}, "file_system": {:hex, :file_system, "1.1.1", "31864f4685b0148f25bd3fbef2b1228457c0c89024ad67f7a81a3ffbc0bbad3a", [:mix], [], "hexpm", "7a15ff97dfe526aeefb090a7a9d3d03aa907e100e262a0f8f7746b78f8f87a5d"}, @@ -17,8 +21,11 @@ "gettext": {:hex, :gettext, "1.0.2", "5457e1fd3f4abe47b0e13ff85086aabae760497a3497909b8473e0acee57673b", [:mix], [{:expo, "~> 0.5.1 or ~> 1.0", [hex: :expo, repo: "hexpm", optional: false]}], "hexpm", "eab805501886802071ad290714515c8c4a17196ea76e5afc9d06ca85fb1bfeb3"}, "heroicons": {:git, "https://github.com/tailwindlabs/heroicons.git", "0435d4ca364a608cc75e2f8683d374e55abbae26", [tag: "v2.2.0", sparse: "optimized", depth: 1]}, "hpax": {:hex, :hpax, "1.0.3", "ed67ef51ad4df91e75cc6a1494f851850c0bd98ebc0be6e81b026e765ee535aa", [:mix], [], "hexpm", "8eab6e1cfa8d5918c2ce4ba43588e894af35dbd8e91e6e55c817bca5847df34a"}, + "idna": {:hex, :idna, "6.1.1", "8a63070e9f7d0c62eb9d9fcb360a7de382448200fbbd1b106cc96d3d8099df8d", [:rebar3], [{:unicode_util_compat, "~> 0.7.0", [hex: :unicode_util_compat, repo: "hexpm", optional: false]}], "hexpm", "92376eb7894412ed19ac475e4a86f7b413c1b9fbb5bd16dccd57934157944cea"}, "jason": {:hex, :jason, "1.4.4", "b9226785a9aa77b6857ca22832cffa5d5011a667207eb2a0ad56adb5db443b8a", [:mix], [{:decimal, "~> 1.0 or ~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}], "hexpm", "c5eb0cab91f094599f94d55bc63409236a8ec69a21a67814529e8d5f6cc90b3b"}, + "jsv": {:hex, :jsv, "0.16.0", "b29e44da822db9f52010edf9db75b58f016434d9862bd76d18aec7a4712cf318", [:mix], [{:abnf_parsec, "~> 2.0", [hex: :abnf_parsec, repo: "hexpm", optional: false]}, {:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}, {:idna, "~> 6.1", [hex: :idna, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: true]}, {:nimble_options, "~> 1.0", [hex: :nimble_options, repo: "hexpm", optional: false]}, {:poison, ">= 3.0.0 and < 7.0.0", [hex: :poison, repo: "hexpm", optional: true]}, {:texture, "~> 0.3", [hex: :texture, repo: "hexpm", optional: false]}], "hexpm", "a4b2aaf5f62641640519da5de479e5704f6f7c8b6e323692bf71b4800d7b69ee"}, "lazy_html": {:hex, :lazy_html, "0.1.10", "ffe42a0b4e70859cf21a33e12a251e0c76c1dff76391609bd56702a0ef5bc429", [:make, :mix], [{:cc_precompiler, "~> 0.1", [hex: :cc_precompiler, repo: "hexpm", optional: false]}, {:elixir_make, "~> 0.9.0", [hex: :elixir_make, repo: "hexpm", optional: false]}, {:fine, "~> 0.1.0", [hex: :fine, repo: "hexpm", optional: false]}], "hexpm", "50f67e5faa09d45a99c1ddf3fac004f051997877dc8974c5797bb5ccd8e27058"}, + "llm_db": {:hex, :llm_db, "2026.3.0", "31c4235c52280cff46c166ffb19a2a53734e8fda44c82864f3f38521e7bc4c2d", [:mix], [{:deep_merge, "~> 1.0", [hex: :deep_merge, repo: "hexpm", optional: false]}, {:dotenvy, "~> 1.1", [hex: :dotenvy, repo: "hexpm", optional: false]}, {:igniter, "~> 0.7", [hex: :igniter, repo: "hexpm", optional: true]}, {:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: false]}, {:toml, "~> 0.7", [hex: :toml, repo: "hexpm", optional: false]}, {:zoi, "~> 0.10", [hex: :zoi, repo: "hexpm", optional: false]}], "hexpm", "4c9cbc6f47eb6d62eb52bca296692f9171c963e3eb3af69f3a555e8c5cff391e"}, "makeup": {:hex, :makeup, "1.2.1", "e90ac1c65589ef354378def3ba19d401e739ee7ee06fb47f94c687016e3713d1", [:mix], [{:nimble_parsec, "~> 1.4", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "d36484867b0bae0fea568d10131197a4c2e47056a6fbe84922bf6ba71c8d17ce"}, "makeup_elixir": {:hex, :makeup_elixir, "1.0.1", "e928a4f984e795e41e3abd27bfc09f51db16ab8ba1aebdba2b3a575437efafc2", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}, {:nimble_parsec, "~> 1.2.3 or ~> 1.3", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "7284900d412a3e5cfd97fdaed4f5ed389b8f2b4cb49efc0eb3bd10e2febf9507"}, "makeup_erlang": {:hex, :makeup_erlang, "1.0.3", "4252d5d4098da7415c390e847c814bad3764c94a814a0b4245176215615e1035", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}], "hexpm", "953297c02582a33411ac6208f2c6e55f0e870df7f80da724ed613f10e6706afd"}, @@ -39,13 +46,21 @@ "plug": {:hex, :plug, "1.19.1", "09bac17ae7a001a68ae393658aa23c7e38782be5c5c00c80be82901262c394c0", [:mix], [{:mime, "~> 1.0 or ~> 2.0", [hex: :mime, repo: "hexpm", optional: false]}, {:plug_crypto, "~> 1.1.1 or ~> 1.2 or ~> 2.0", [hex: :plug_crypto, repo: "hexpm", optional: false]}, {:telemetry, "~> 0.4.3 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "560a0017a8f6d5d30146916862aaf9300b7280063651dd7e532b8be168511e62"}, "plug_crypto": {:hex, :plug_crypto, "2.1.1", "19bda8184399cb24afa10be734f84a16ea0a2bc65054e23a62bb10f06bc89491", [:mix], [], "hexpm", "6470bce6ffe41c8bd497612ffde1a7e4af67f36a15eea5f921af71cf3e11247c"}, "req": {:hex, :req, "0.5.17", "0096ddd5b0ed6f576a03dde4b158a0c727215b15d2795e59e0916c6971066ede", [:mix], [{:brotli, "~> 0.3.1", [hex: :brotli, repo: "hexpm", optional: true]}, {:ezstd, "~> 1.0", [hex: :ezstd, repo: "hexpm", optional: true]}, {:finch, "~> 0.17", [hex: :finch, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}, {:mime, "~> 2.0.6 or ~> 2.1", [hex: :mime, repo: "hexpm", optional: false]}, {:nimble_csv, "~> 1.0", [hex: :nimble_csv, repo: "hexpm", optional: true]}, {:plug, "~> 1.0", [hex: :plug, repo: "hexpm", optional: true]}], "hexpm", "0b8bc6ffdfebbc07968e59d3ff96d52f2202d0536f10fef4dc11dc02a2a43e39"}, + "req_llm": {:hex, :req_llm, "1.6.0", "9866726cb590848e4f68cfb0ed8030509219dd9e81880e395d1b0b273d2102e6", [:mix], [{:dotenvy, "~> 1.1", [hex: :dotenvy, repo: "hexpm", optional: false]}, {:ex_aws_auth, "~> 1.3", [hex: :ex_aws_auth, repo: "hexpm", optional: false]}, {:igniter, "~> 0.7", [hex: :igniter, repo: "hexpm", optional: true]}, {:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}, {:jsv, "~> 0.11", [hex: :jsv, repo: "hexpm", optional: false]}, {:llm_db, "~> 2026.1", [hex: :llm_db, repo: "hexpm", optional: false]}, {:nimble_options, "~> 1.1", [hex: :nimble_options, repo: "hexpm", optional: false]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: false]}, {:server_sent_events, "~> 0.2", [hex: :server_sent_events, repo: "hexpm", optional: false]}, {:splode, "~> 0.3.0", [hex: :splode, repo: "hexpm", optional: false]}, {:uniq, "~> 0.6", [hex: :uniq, repo: "hexpm", optional: false]}, {:zoi, "~> 0.14", [hex: :zoi, repo: "hexpm", optional: false]}], "hexpm", "0711ae09fa297e1e842837b0259ea179f9212893420abd9cb93a020b6bf69348"}, + "server_sent_events": {:hex, :server_sent_events, "0.2.1", "f83b34f01241302a8bf451efc8dde3a36c533d5715463c31c653f3db8695f636", [:mix], [], "hexpm", "c8099ce4f9acd610eb7c8e0f89dba7d5d1c13300ea9884b0bd8662401d3cf96f"}, "sobelow": {:hex, :sobelow, "0.14.1", "2f81e8632f15574cba2402bcddff5497b413c01e6f094bc0ab94e83c2f74db81", [:mix], [{:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "8fac9a2bd90fdc4b15d6fca6e1608efb7f7c600fa75800813b794ee9364c87f2"}, + "splode": {:hex, :splode, "0.3.0", "ff8effecc509a51245df2f864ec78d849248647c37a75886033e3b1a53ca9470", [:mix], [], "hexpm", "73cfd0892d7316d6f2c93e6e8784bd6e137b2aa38443de52fd0a25171d106d81"}, "tailwind": {:hex, :tailwind, "0.4.1", "e7bcc222fe96a1e55f948e76d13dd84a1a7653fb051d2a167135db3b4b08d3e9", [:mix], [], "hexpm", "6249d4f9819052911120dbdbe9e532e6bd64ea23476056adb7f730aa25c220d1"}, "telemetry": {:hex, :telemetry, "1.3.0", "fedebbae410d715cf8e7062c96a1ef32ec22e764197f70cda73d82778d61e7a2", [:rebar3], [], "hexpm", "7015fc8919dbe63764f4b4b87a95b7c0996bd539e0d499be6ec9d7f3875b79e6"}, "telemetry_metrics": {:hex, :telemetry_metrics, "1.1.0", "5bd5f3b5637e0abea0426b947e3ce5dd304f8b3bc6617039e2b5a008adc02f8f", [:mix], [{:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "e7b79e8ddfde70adb6db8a6623d1778ec66401f366e9a8f5dd0955c56bc8ce67"}, "telemetry_poller": {:hex, :telemetry_poller, "1.3.0", "d5c46420126b5ac2d72bc6580fb4f537d35e851cc0f8dbd571acf6d6e10f5ec7", [:rebar3], [{:telemetry, "~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "51f18bed7128544a50f75897db9974436ea9bfba560420b646af27a9a9b35211"}, + "texture": {:hex, :texture, "0.3.2", "ca68fc2804ce05ffe33cded85d69b5ebadb0828233227accfe3c574e34fd4e3f", [:mix], [{:abnf_parsec, "~> 2.0", [hex: :abnf_parsec, repo: "hexpm", optional: false]}], "hexpm", "43bb1069d9cf4309ed6f0ff65ade787a76f986b821ab29d1c96b5b5102cb769c"}, "thousand_island": {:hex, :thousand_island, "1.4.3", "2158209580f633be38d43ec4e3ce0a01079592b9657afff9080d5d8ca149a3af", [:mix], [{:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "6e4ce09b0fd761a58594d02814d40f77daff460c48a7354a15ab353bb998ea0b"}, "tidewave": {:hex, :tidewave, "0.5.5", "a125dfc87f99daf0e2280b3a9719b874c616ead5926cdf9cdfe4fcc19a020eff", [:mix], [{:circular_buffer, "~> 0.4 or ~> 1.0", [hex: :circular_buffer, repo: "hexpm", optional: false]}, {:igniter, "~> 0.6", [hex: :igniter, repo: "hexpm", optional: true]}, {:jason, "~> 1.4", [hex: :jason, repo: "hexpm", optional: false]}, {:phoenix_live_reload, ">= 1.6.1", [hex: :phoenix_live_reload, repo: "hexpm", optional: true]}, {:plug, "~> 1.17", [hex: :plug, repo: "hexpm", optional: false]}, {:req, "~> 0.5", [hex: :req, repo: "hexpm", optional: false]}], "hexpm", "825ebb4fa20de005785efa21e5a88c04d81c3f57552638d12ff3def2f203dbf7"}, + "toml": {:hex, :toml, "0.7.0", "fbcd773caa937d0c7a02c301a1feea25612720ac3fa1ccb8bfd9d30d822911de", [:mix], [], "hexpm", "0690246a2478c1defd100b0c9b89b4ea280a22be9a7b313a8a058a2408a2fa70"}, + "unicode_util_compat": {:hex, :unicode_util_compat, "0.7.1", "a48703a25c170eedadca83b11e88985af08d35f37c6f664d6dcfb106a97782fc", [:rebar3], [], "hexpm", "b3a917854ce3ae233619744ad1e0102e05673136776fb2fa76234f3e03b23642"}, + "uniq": {:hex, :uniq, "0.6.2", "51846518c037134c08bc5b773468007b155e543d53c8b39bafe95b0af487e406", [:mix], [{:ecto, "~> 3.0", [hex: :ecto, repo: "hexpm", optional: true]}], "hexpm", "95aa2a41ea331ef0a52d8ed12d3e730ef9af9dbc30f40646e6af334fbd7bc0fc"}, "websock": {:hex, :websock, "0.5.3", "2f69a6ebe810328555b6fe5c831a851f485e303a7c8ce6c5f675abeb20ebdadc", [:mix], [], "hexpm", "6105453d7fac22c712ad66fab1d45abdf049868f253cf719b625151460b8b453"}, "websock_adapter": {:hex, :websock_adapter, "0.5.9", "43dc3ba6d89ef5dec5b1d0a39698436a1e856d000d84bf31a3149862b01a287f", [:mix], [{:bandit, ">= 0.6.0", [hex: :bandit, repo: "hexpm", optional: true]}, {:plug, "~> 1.14", [hex: :plug, repo: "hexpm", optional: false]}, {:plug_cowboy, "~> 2.6", [hex: :plug_cowboy, repo: "hexpm", optional: true]}, {:websock, "~> 0.5", [hex: :websock, repo: "hexpm", optional: false]}], "hexpm", "5534d5c9adad3c18a0f58a9371220d75a803bf0b9a3d87e6fe072faaeed76a08"}, + "zoi": {:hex, :zoi, "0.17.1", "406aa87bb4181f41dee64336b75434367b7d3e88db813b0e6db0ae2d0f81f743", [:mix], [{:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}, {:phoenix_html, "~> 2.14.2 or ~> 3.0 or ~> 4.1", [hex: :phoenix_html, repo: "hexpm", optional: true]}], "hexpm", "3a11bf3bc9189f988ac74e81b5d7ca0c689b2a20eed220746a7043aa528e2aab"}, } From 9ee3b58d1b3ddf8bedcab027faa6ab7a46395041 Mon Sep 17 00:00:00 2001 From: errantsky Date: Thu, 5 Mar 2026 21:50:43 -0800 Subject: [PATCH 2/3] fix: address review issues from multi-provider LLM branch - Fix usage key mismatch: check both provider-specific and normalized key names for cache tokens (cache_creation_input_tokens vs cache_creation_tokens, cache_read_input_tokens vs cached_tokens) - Add catch-all clause for unsupported message roles in build_context - Make resolve_model!/2 actually raise on unknown keys (matching the ! convention) instead of silently falling back to :large - Add missing {:ok, non_string} clause in Config.resolve_model/2 to return a descriptive error instead of CaseClauseError - Fix FallbackLLM default from RLM.LLM (now behaviour-only) to RLM.LLM.ReqLLM - Strip "anthropic:" provider prefix in RLM.LLM.Anthropic before sending to the Anthropic API (models map stores prefixed specs) Co-Authored-By: Claude Opus 4.6 --- lib/rlm/config.ex | 10 ++++++++-- lib/rlm/llm/anthropic.ex | 5 +++++ lib/rlm/llm/req_llm.ex | 9 +++++++-- lib/rlm/replay/fallback_llm.ex | 2 +- lib/rlm/worker.ex | 9 +++++++-- 5 files changed, 28 insertions(+), 7 deletions(-) diff --git a/lib/rlm/config.ex b/lib/rlm/config.ex index fe268ff..1e2004a 100644 --- a/lib/rlm/config.ex +++ b/lib/rlm/config.ex @@ -109,8 +109,14 @@ defmodule RLM.Config do @spec resolve_model(t(), atom()) :: {:ok, String.t()} | {:error, String.t()} def resolve_model(%__MODULE__{models: models}, key) when is_atom(key) do case Map.fetch(models, key) do - {:ok, spec} when is_binary(spec) -> {:ok, spec} - :error -> {:error, "Unknown model key: #{key}"} + {:ok, spec} when is_binary(spec) -> + {:ok, spec} + + {:ok, other} -> + {:error, "Model key #{key} has invalid spec: #{inspect(other)} (expected a string)"} + + :error -> + {:error, "Unknown model key: #{key}"} end end diff --git a/lib/rlm/llm/anthropic.ex b/lib/rlm/llm/anthropic.ex index 2fe0587..9c51d6d 100644 --- a/lib/rlm/llm/anthropic.ex +++ b/lib/rlm/llm/anthropic.ex @@ -16,6 +16,7 @@ defmodule RLM.LLM.Anthropic do @impl true def chat(messages, model, config, opts \\ []) do url = String.trim_trailing(config.api_base_url, "/") <> "/v1/messages" + model = strip_provider_prefix(model) {system_text, user_messages} = extract_system(messages) @@ -71,6 +72,10 @@ defmodule RLM.LLM.Anthropic do end end + # Strip "anthropic:" prefix if present (models map stores provider-prefixed specs) + defp strip_provider_prefix("anthropic:" <> bare), do: bare + defp strip_provider_prefix(model), do: model + defp extract_system(messages) do case Enum.split_with(messages, fn m -> m.role == :system end) do {[], rest} -> {nil, rest} diff --git a/lib/rlm/llm/req_llm.ex b/lib/rlm/llm/req_llm.ex index a5866e0..ab372ed 100644 --- a/lib/rlm/llm/req_llm.ex +++ b/lib/rlm/llm/req_llm.ex @@ -62,6 +62,7 @@ defmodule RLM.LLM.ReqLLM do case msg.role do :user -> ReqLLM.Context.user(msg.content) :assistant -> ReqLLM.Context.assistant(msg.content) + other -> raise ArgumentError, "unsupported message role #{inspect(other)}" end end) @@ -120,8 +121,12 @@ defmodule RLM.LLM.ReqLLM do prompt_tokens: Map.get(raw, :input_tokens), completion_tokens: Map.get(raw, :output_tokens), total_tokens: Map.get(raw, :total_tokens), - cache_creation_input_tokens: Map.get(raw, :cache_creation_input_tokens), - cache_read_input_tokens: Map.get(raw, :cache_read_input_tokens) + # Anthropic provider includes :cache_creation_input_tokens; + # other providers use the normalized :cache_creation_tokens key + cache_creation_input_tokens: + Map.get(raw, :cache_creation_input_tokens) || Map.get(raw, :cache_creation_tokens), + cache_read_input_tokens: + Map.get(raw, :cache_read_input_tokens) || Map.get(raw, :cached_tokens) } end diff --git a/lib/rlm/replay/fallback_llm.ex b/lib/rlm/replay/fallback_llm.ex index 71ad642..aa0f80b 100644 --- a/lib/rlm/replay/fallback_llm.ex +++ b/lib/rlm/replay/fallback_llm.ex @@ -13,7 +13,7 @@ defmodule RLM.Replay.FallbackLLM do def chat(messages, model, config, opts \\ []) do case pop_entry() do nil -> - fallback_module = Process.get(:rlm_replay_fallback_module, RLM.LLM) + fallback_module = Process.get(:rlm_replay_fallback_module, RLM.LLM.ReqLLM) fallback_module.chat(messages, model, config, opts) entry -> diff --git a/lib/rlm/worker.ex b/lib/rlm/worker.ex index 1f1610d..fb90b2f 100644 --- a/lib/rlm/worker.ex +++ b/lib/rlm/worker.ex @@ -960,8 +960,13 @@ defmodule RLM.Worker do defp resolve_model!(config, key) do case RLM.Config.resolve_model(config, key) do - {:ok, spec} -> spec - {:error, _} -> Map.get(config.models, :large, "claude-sonnet-4-6") + {:ok, spec} -> + spec + + {:error, reason} -> + raise ArgumentError, + "Cannot resolve model key #{inspect(key)}: #{reason}. " <> + "Available keys: #{inspect(Map.keys(config.models))}" end end From ce8a6e84adffe6689f35213f1ccf4aa7142d55a5 Mon Sep 17 00:00:00 2001 From: errantsky Date: Fri, 6 Mar 2026 08:28:36 -0800 Subject: [PATCH 3/3] fix: address PR review findings and improve docs/examples MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review agents identified several issues across 5 categories: Silent failures: - encode_object/1 returns {:error, :no_content} instead of empty string - Tape.get_events/1 catches :noproc specifically, logs unexpected exits - FallbackLLM logs when transitioning from tape to live LLM - extract_usage/1 warns when all usage fields are nil - context_window_for/2 warns on unknown model keys Documentation: - CLAUDE.md: remove stale cost config rows, fix env var references, fix models default to bare names, rewrite agent orientation section with key contracts, DI patterns, and common modification patterns - Fix moduledocs: Replay (default module), Anthropic (differentiator), Worker (provider-agnostic), LLM behaviour (bare model names) - Fix "Ollama (via vLLM)" → "Ollama (local models)" in config/req_llm Examples: - Update all examples from CLAUDE_API_KEY to ANTHROPIC_API_KEY - Add examples/local_models.exs for Ollama/local model usage - Update mix rlm.examples task with local_models entry Tests: - Add test/rlm/config_test.exs with 16 tests covering resolve_model/2, context_window_for/2, load/1 defaults, models map, and API key Config: - Remove unused top-level `require Logger` from config.ex Co-Authored-By: Claude Opus 4.6 --- CHANGELOG.md | 29 +++++- CLAUDE.md | 105 +++++++++++++++++--- examples/code_review.exs | 2 +- examples/local_models.exs | 164 +++++++++++++++++++++++++++++++ examples/map_reduce_analysis.exs | 2 +- examples/research_synthesis.exs | 9 +- examples/smoke_test.exs | 10 +- examples/web_fetch.exs | 2 +- lib/mix/tasks/rlm.examples.ex | 20 ++-- lib/rlm/config.ex | 25 +++-- lib/rlm/llm.ex | 2 +- lib/rlm/llm/anthropic.ex | 10 +- lib/rlm/llm/req_llm.ex | 35 +++++-- lib/rlm/replay.ex | 2 +- lib/rlm/replay/fallback_llm.ex | 7 ++ lib/rlm/replay/tape.ex | 18 +++- lib/rlm/worker.ex | 2 +- test/rlm/config_test.exs | 125 +++++++++++++++++++++++ 18 files changed, 510 insertions(+), 59 deletions(-) create mode 100644 examples/local_models.exs create mode 100644 test/rlm/config_test.exs diff --git a/CHANGELOG.md b/CHANGELOG.md index f3d9af5..85f15f9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,9 +21,9 @@ All notable changes to this project are documented here. - `RLM.LLM` refactored to a pure behaviour module + shared utilities (`extract_structured/1`, `response_schema/0`); no longer contains an implementation. - `models` config field — `%{atom() => String.t()}` map of symbolic keys to - provider-prefixed model specs. Default: `%{large: "anthropic:claude-sonnet-4-6", - small: "anthropic:claude-haiku-4-5"}`. Pass custom maps for Ollama/OpenAI: - `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}` + model specs. Default: `%{large: "claude-sonnet-4-6", small: "claude-haiku-4-5"}`. + Bare names are auto-prefixed with `"anthropic:"` by `ReqLLM`. Pass custom maps + for Ollama/OpenAI: `models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}` - `RLM.Config.resolve_model/2` — looks up a model key in the `models` map - `RLM.Config.context_window_for/2` — resolves context window size for a model key (legacy fields for `:large`/`:small`, default 128k for custom keys) @@ -66,9 +66,32 @@ All notable changes to this project are documented here. then falls back to a live LLM module when the tape is exhausted - `:fallback` option on `RLM.replay/2` — `:error` (default) or `:live` to switch to live LLM calls when the tape runs out (e.g., because a patch caused extra iterations) +- `examples/local_models.exs` — new example demonstrating Ollama/local model usage + with no API key required. Registered as `mix rlm.examples local_models` +- `test/rlm/config_test.exs` — 16 new unit tests for `Config.load/1`, + `Config.resolve_model/2`, and `Config.context_window_for/2` - 17 tests covering recording, tape construction, replay LLM, replay orchestration, patching, fallback behavior, and the public API +### Fixed + +- `RLM.LLM.ReqLLM.encode_object/1` now returns an explicit error instead of silently + falling back to an empty string when the LLM response contains no usable content +- `RLM.LLM.ReqLLM.extract_usage/1` logs a warning when token usage extraction fails + (all fields nil despite non-empty response), preventing silent zero-cost reporting +- `RLM.Replay.Tape.get_events/1` now catches `:noproc` exits specifically and logs + a warning for unexpected exit reasons, instead of broadly swallowing all exits +- `RLM.Replay.FallbackLLM` now logs when switching from tape replay to live LLM calls +- `RLM.Config.context_window_for/2` now logs a warning when using the 128k default + for custom model keys, making it easier to diagnose compaction behavior +- `RLM.Replay` moduledoc corrected: fallback default is `RLM.LLM.ReqLLM` (not `RLM.LLM`) +- `RLM.Worker` moduledoc updated to be provider-agnostic (no longer references "Claude's + output_config" specifically) +- `CLAUDE.md` — removed stale `cost_per_1k_*` config fields; fixed `models` default to + match actual bare-name defaults; updated env var references to `ANTHROPIC_API_KEY` +- All examples updated from `CLAUDE_API_KEY` to `ANTHROPIC_API_KEY`; smoke test checks + both env vars + **Distributed Erlang node support** - `RLM.Node` — lightweight wrapper for OTP distribution with three public functions: diff --git a/CLAUDE.md b/CLAUDE.md index dba0805..7ebc6a0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -86,10 +86,10 @@ mix test # Run tests with trace output mix test --trace -# Run live API tests (requires CLAUDE_API_KEY env var) +# Run live API tests (requires ANTHROPIC_API_KEY or CLAUDE_API_KEY env var) mix test --include live_api -# Live smoke test (requires CLAUDE_API_KEY env var) +# Live smoke test (requires ANTHROPIC_API_KEY or CLAUDE_API_KEY env var) mix rlm.smoke # Interactive shell @@ -178,16 +178,16 @@ LLM responses use structured output (JSON schema) to constrain responses to `{"reasoning": "...", "code": "..."}` objects. Feedback messages after eval are also structured JSON. -The `models` config field maps symbolic keys to provider-prefixed specs: +The `models` config field maps symbolic keys to model specs: ```elixir RLM.run(context, query, models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}) ``` -Default models: -- Large: `anthropic:claude-sonnet-4-6` -- Small: `anthropic:claude-haiku-4-5` +Default models (bare names; `ReqLLM` auto-prefixes with `"anthropic:"`): +- Large: `claude-sonnet-4-6` +- Small: `claude-haiku-4-5` ## Module Map @@ -264,7 +264,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`. |---|---|---| | `api_base_url` | `"https://api.anthropic.com"` | Anthropic API base URL | | `api_key` | `ANTHROPIC_API_KEY` env var | API key for LLM requests (falls back to `CLAUDE_API_KEY`) | -| `models` | `%{large: "anthropic:claude-sonnet-4-6", small: "anthropic:claude-haiku-4-5"}` | Named model map; keys are atoms, values are `"provider:model"` specs | +| `models` | `%{large: "claude-sonnet-4-6", small: "claude-haiku-4-5"}` | Named model map; keys are atoms, values are model specs. Bare names are auto-prefixed with `"anthropic:"` by `ReqLLM` | | `model_large` | `claude-sonnet-4-6` | Legacy; used to build default `models` map | | `model_small` | `claude-haiku-4-5` | Legacy; used to build default `models` map | | `max_iterations` | `25` | Per-worker LLM turn limit | @@ -277,10 +277,6 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`. | `eval_timeout` | `300_000` | ms per eval (5 min) | | `llm_timeout` | `120_000` | ms per LLM request (2 min) | | `subcall_timeout` | `600_000` | ms per subcall (10 min) | -| `cost_per_1k_prompt_tokens_large` | `0.003` | Cost tracking for large model input | -| `cost_per_1k_prompt_tokens_small` | `0.0008` | Cost tracking for small model input | -| `cost_per_1k_completion_tokens_large` | `0.015` | Cost tracking for large model output | -| `cost_per_1k_completion_tokens_small` | `0.004` | Cost tracking for small model output | | `enable_otel` | `false` | Enable OpenTelemetry integration | | `enable_event_log` | `true` | Enable per-run EventLog trace agents | | `event_log_capture_full_stdout` | `false` | Store full stdout in traces (vs truncated) | @@ -293,7 +289,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`. - Worker/keep_alive tests run `async: false` since MockLLM uses global state - Tool tests and sandbox tests can run `async: true` (no global state) - Live API tests tagged with `@moduletag :live_api` and excluded by default -- `mix test --include live_api` requires `CLAUDE_API_KEY` env var +- `mix test --include live_api` requires `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`) env var - Test support files in `test/support/` - Tool tests use a per-test temp directory (created in `setup`, cleaned in `on_exit`) - Worker concurrency/depth tests use `RLM.Test.Helpers.start_test_run/1` to create a Run, then spawn Workers via `RLM.Run.start_worker/2` @@ -304,7 +300,7 @@ Read-only Phoenix LiveView dashboard. Serves on `http://localhost:4000`. - Workers use `restart: :temporary` — they terminate normally after completion - The `llm_module` config field enables dependency injection for testing - Bash tool uses `Task.async` + `Task.yield/2` (not `System.cmd` — it has no `:timeout` option) -- `.env` file with `CLAUDE_API_KEY` should exist at project root but must not be committed +- `.env` file with `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`) should exist at project root but must not be committed - `RLM.run/3` monitors the Worker with `Process.monitor` so crashes return `{:error, reason}` rather than hanging indefinitely @@ -327,6 +323,8 @@ The dashboard is a Phoenix 1.8 LiveView application. Key conventions: ## Orientation for Coding Agents +### Getting Started + When starting a task, read these files in order: 1. **`CLAUDE.md`** (this file) — architecture, invariants, module map @@ -334,13 +332,90 @@ When starting a task, read these files in order: 3. The specific module(s) relevant to your task (see Module Map above) 4. The corresponding test file to understand expected behaviour -Key invariants **never to break**: +### Key Invariants (Never Break These) + - Raw input data must not enter any LLM context window (use `preview/2` or metadata only) - Workers are `:temporary` — do not change their restart strategy - The async-eval pattern in `RLM.Worker` is intentional; do not make eval synchronous - All session tests must use `async: false` (MockLLM is global ETS state) +- Run → Worker communication is always `send/2`, never `GenServer.call` (deadlock prevention) + +### Key Contracts & Interfaces + +**LLM Behaviour** (`RLM.LLM`): +```elixir +@callback chat(messages :: [map()], model :: String.t(), config :: RLM.Config.t(), opts :: keyword()) :: + {:ok, json_string :: String.t(), usage :: usage()} | {:error, String.t()} +``` +All LLM modules (`ReqLLM`, `Anthropic`, `MockLLM`, `Replay.LLM`, `Replay.FallbackLLM`) implement +this same callback. The `json_string` return is always a JSON-encoded string, never a parsed map. + +**Usage type**: `%{prompt_tokens: integer | nil, completion_tokens: integer | nil, total_tokens: integer | nil, cache_creation_input_tokens: integer | nil, cache_read_input_tokens: integer | nil}` + +**Model resolution**: Use `RLM.Config.resolve_model(config, :large | :small | atom())` → `{:ok, "provider:model-name"}` or `{:error, reason}`. In Worker, use `resolve_model!/2` (raises on unknown keys). + +**Tool Behaviour** (`RLM.Tool`): +```elixir +@callback name() :: String.t() +@callback description() :: String.t() +@callback execute(map()) :: {:ok, String.t()} | {:error, String.t()} +``` + +### Dependency Injection Pattern + +The `llm_module` config field is the primary injection point: +- **Production**: `RLM.LLM.ReqLLM` (default) — multi-provider via `req_llm` +- **Testing**: `RLM.Test.MockLLM` — ETS-based response queue, set in `config/test.exs` +- **Legacy**: `RLM.LLM.Anthropic` — direct Anthropic HTTP client +- **Replay**: `RLM.Replay.LLM` / `RLM.Replay.FallbackLLM` — tape-based, set by `RLM.Replay` + +When adding a new LLM feature, implement it in the behaviour callback — the Worker +calls `config.llm_module.chat(...)` and is provider-agnostic. + +### Testing Patterns + +**MockLLM usage** — queue expected responses before running Workers: +```elixir +RLM.Test.MockLLM.enqueue(%{ + "reasoning" => "I'll count the lines", + "code" => ~s(final_answer = 4) +}) +``` +MockLLM is global ETS state. Tests using it must be `async: false`. + +**Creating a test Run** — use `RLM.Test.Helpers.start_test_run/1`: +```elixir +{run_pid, run_id} = RLM.Test.Helpers.start_test_run(config) +{:ok, worker_pid, span_id} = RLM.Run.start_worker(run_pid, worker_opts) +``` + +**Tool tests** — use per-test temp dirs (created in `setup`, cleaned in `on_exit`); +these can run `async: true` since tools have no global state. + +### Common Modification Patterns + +**Adding a new config field:** +1. Add to `defstruct` in `config.ex` +2. Add to `load/1` with `get(overrides, :key, default)` +3. Add row to CLAUDE.md Config Fields table +4. Add to CHANGELOG.md + +**Adding a new tool:** +1. Create `lib/rlm/tools/my_tool.ex` implementing `RLM.Tool` +2. Add to `RLM.ToolRegistry.all/0` +3. Add wrapper function to `RLM.Sandbox` +4. Add to system prompt in `priv/system_prompt.md` +5. Add row to CLAUDE.md Module Map (Filesystem Tools section) + +**Adding a new LLM behaviour implementation:** +1. Create module with `@behaviour RLM.LLM` +2. Implement `chat/4` returning `{:ok, json_string, usage}` or `{:error, string}` +3. Users select it via `llm_module:` config override +4. Add row to CLAUDE.md Module Map + +### Before Committing -Before committing, always run: +Always run: ```bash mix compile --warnings-as-errors mix test diff --git a/examples/code_review.exs b/examples/code_review.exs index 67c39cd..674bd65 100644 --- a/examples/code_review.exs +++ b/examples/code_review.exs @@ -11,7 +11,7 @@ # - Filesystem tool usage visible in code blocks # # Usage: -# export CLAUDE_API_KEY=sk-ant-... +# export ANTHROPIC_API_KEY=sk-ant-... # mix run examples/code_review.exs # # Or via the Mix task: diff --git a/examples/local_models.exs b/examples/local_models.exs new file mode 100644 index 0000000..94bc6b2 --- /dev/null +++ b/examples/local_models.exs @@ -0,0 +1,164 @@ +# examples/local_models.exs +# +# Local Model Usage — demonstrates running RLM with local Ollama models +# instead of Anthropic's cloud API. No API key required. +# +# Prerequisites: +# 1. Install Ollama: https://ollama.com +# 2. Pull a model: +# ollama pull qwen3:8b +# Or any other model you prefer (llama3.2, mistral, etc.) +# 3. Ollama server must be running (it starts automatically after install) +# +# Usage: +# mix run examples/local_models.exs +# +# Or via the Mix task: +# mix rlm.examples local_models +# +# Configuration options shown: +# - models: %{large: "ollama:model", small: "ollama:model"} +# - Using the same model for both large and small +# - Using different models for parent vs subcall workers +# - Overriding max_iterations for faster local runs + +defmodule RLM.Examples.LocalModels do + @moduledoc false + + # Change these to match the models you have pulled locally. + # Run `ollama list` to see available models. + @default_model "ollama:qwen3:8b" + + def run do + check_ollama!() + + model = System.get_env("RLM_LOCAL_MODEL", @default_model) + IO.puts("\nRLM Local Models Example") + IO.puts("========================") + IO.puts("Model: #{model}") + IO.puts("") + + results = + [ + {&test_basic_run/1, "Basic run"}, + {&test_multi_step/1, "Multi-step"}, + {&test_custom_model_map/1, "Custom model map"} + ] + |> Enum.map(fn {test_fn, name} -> + IO.write(" Running: #{name}... ") + + case test_fn.(model) do + {:ok, detail} -> + IO.puts("PASS — #{detail}") + {name, :pass} + + {:error, reason} -> + IO.puts("FAIL — #{reason}") + {name, :fail} + end + end) + + passes = Enum.count(results, fn {_, s} -> s == :pass end) + fails = Enum.count(results, fn {_, s} -> s == :fail end) + IO.puts("\n#{passes} passed, #{fails} failed out of #{length(results)} tests") + + if fails > 0, do: System.halt(1) + end + + # --------------------------------------------------------------------------- + # Tests + # --------------------------------------------------------------------------- + + defp test_basic_run(model) do + # Simplest usage: override the models map to use a local model + case RLM.run( + "Elixir, Rust, Python", + "Count the programming languages. Return the count as an integer.", + models: %{large: model, small: model}, + max_iterations: 10 + ) do + {:ok, 3, _run_id} -> + {:ok, "counted 3 languages"} + + {:ok, other, _} -> + {:error, "expected 3, got #{inspect(other)}"} + + {:error, reason} -> + {:error, inspect(reason)} + end + end + + defp test_multi_step(model) do + # Multi-step reasoning with a local model + case RLM.run( + "apple, banana, cherry", + "First store the number of items in a variable called count. " <> + "Then set final_answer to count * 100.", + models: %{large: model, small: model}, + max_iterations: 10 + ) do + {:ok, 300, _run_id} -> + {:ok, "3 * 100 = 300"} + + {:ok, other, _} -> + {:error, "expected 300, got #{inspect(other)}"} + + {:error, reason} -> + {:error, inspect(reason)} + end + end + + defp test_custom_model_map(model) do + # Demonstrate using different models for different roles. + # In practice, you might use a larger model for the parent worker + # and a smaller one for subcalls: + # + # models: %{ + # large: "ollama:qwen3:32b", + # small: "ollama:qwen3:8b", + # fast: "ollama:qwen3:1.7b" + # } + # + # For this test, we use the same model for both to keep it simple. + case RLM.run( + "Hello", + "Set final_answer to the string \"hello from local model\"", + models: %{large: model, small: model}, + max_iterations: 5 + ) do + {:ok, result, _run_id} when is_binary(result) -> + {:ok, "got: #{String.slice(result, 0, 50)}"} + + {:ok, other, _} -> + {:error, "expected string, got #{inspect(other)}"} + + {:error, reason} -> + {:error, inspect(reason)} + end + end + + # --------------------------------------------------------------------------- + # Helpers + # --------------------------------------------------------------------------- + + defp check_ollama! do + case System.cmd("which", ["ollama"], stderr_to_stdout: true) do + {_, 0} -> + :ok + + _ -> + IO.puts(""" + + Ollama not found. Install it from https://ollama.com + + After installing, pull a model: + ollama pull qwen3:8b + + """) + + System.halt(1) + end + end +end + +RLM.Examples.LocalModels.run() diff --git a/examples/map_reduce_analysis.exs b/examples/map_reduce_analysis.exs index 9398a4c..8395cb5 100644 --- a/examples/map_reduce_analysis.exs +++ b/examples/map_reduce_analysis.exs @@ -9,7 +9,7 @@ # - Bindings growing across iterations # # Usage: -# export CLAUDE_API_KEY=sk-ant-... +# export ANTHROPIC_API_KEY=sk-ant-... # mix run examples/map_reduce_analysis.exs # # Or via the Mix task: diff --git a/examples/research_synthesis.exs b/examples/research_synthesis.exs index 77cd91e..cf7414c 100644 --- a/examples/research_synthesis.exs +++ b/examples/research_synthesis.exs @@ -11,7 +11,7 @@ # - Mix of direct queries and full subcalls visible as different node types # # Usage: -# export CLAUDE_API_KEY=sk-ant-... +# export ANTHROPIC_API_KEY=sk-ant-... # mix run examples/research_synthesis.exs # # Or via the Mix task: @@ -215,7 +215,12 @@ defmodule RLM.Examples.ResearchSynthesis do if narrative = result["narrative"] do IO.puts(" Narrative preview:") - narrative |> String.slice(0, 400) |> String.split("\n") |> Enum.each(&IO.puts(" #{&1}")) + + narrative + |> String.slice(0, 400) + |> String.split("\n") + |> Enum.each(&IO.puts(" #{&1}")) + IO.puts(" ...") end end diff --git a/examples/smoke_test.exs b/examples/smoke_test.exs index abf53f6..a0aaa1f 100644 --- a/examples/smoke_test.exs +++ b/examples/smoke_test.exs @@ -1,9 +1,9 @@ # examples/smoke_test.exs # -# Live smoke test for the RLM engine. Requires CLAUDE_API_KEY. +# Live smoke test for the RLM engine. Requires ANTHROPIC_API_KEY (or CLAUDE_API_KEY). # # Usage: -# export CLAUDE_API_KEY=sk-ant-... +# export ANTHROPIC_API_KEY=sk-ant-... # mix run examples/smoke_test.exs # # Or via the Mix task: @@ -152,9 +152,11 @@ defmodule RLM.SmokeTest do # --------------------------------------------------------------------------- defp check_api_key! do - case System.get_env("CLAUDE_API_KEY") do + key = System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY") + + case key do nil -> - IO.puts("\n ERROR: CLAUDE_API_KEY not set. Export it before running.\n") + IO.puts("\n ERROR: ANTHROPIC_API_KEY not set. Export it before running.\n") System.halt(1) key -> diff --git a/examples/web_fetch.exs b/examples/web_fetch.exs index c5a2129..1b80edf 100644 --- a/examples/web_fetch.exs +++ b/examples/web_fetch.exs @@ -11,7 +11,7 @@ # Uses the public GitHub API (no auth required, 60 requests/hour limit). # # Usage: -# export CLAUDE_API_KEY=sk-ant-... +# export ANTHROPIC_API_KEY=sk-ant-... # mix run examples/web_fetch.exs # # Or via the Mix task: diff --git a/lib/mix/tasks/rlm.examples.ex b/lib/mix/tasks/rlm.examples.ex index a7dd370..2a95d4b 100644 --- a/lib/mix/tasks/rlm.examples.ex +++ b/lib/mix/tasks/rlm.examples.ex @@ -1,6 +1,6 @@ defmodule Mix.Tasks.Rlm.Examples do use Boundary, classify_to: RLM - @shortdoc "Run RLM example scenarios against the live Anthropic API" + @shortdoc "Run RLM example scenarios against live LLM providers" @moduledoc """ Runs RLM example scenarios that exercise multi-iteration, subcall depth, parallel queries, schema-mode extraction, and filesystem tools. @@ -8,17 +8,18 @@ defmodule Mix.Tasks.Rlm.Examples do These produce rich execution traces viewable in the web dashboard (`mix phx.server` → http://localhost:4000). - Requires the `CLAUDE_API_KEY` environment variable to be set. + Cloud examples require `ANTHROPIC_API_KEY` (or `CLAUDE_API_KEY`). + The `local_models` example uses Ollama and requires no API key. ## Usage - # Run all examples + # Run all cloud examples mix rlm.examples # Run a specific example mix rlm.examples map_reduce mix rlm.examples code_review - mix rlm.examples research_synthesis + mix rlm.examples local_models # List available examples mix rlm.examples --list @@ -45,6 +46,11 @@ defmodule Mix.Tasks.Rlm.Examples do "examples/web_fetch.exs", "RLM.Examples.WebFetch", "Web Fetch & JSON Processing — curl + jq via bash tool" + }, + "local_models" => { + "examples/local_models.exs", + "RLM.Examples.LocalModels", + "Local Models — Ollama/vLLM usage (no API key required)" } } @@ -131,9 +137,11 @@ defmodule Mix.Tasks.Rlm.Examples do end defp check_api_key! do - case System.get_env("CLAUDE_API_KEY") do + key = System.get_env("ANTHROPIC_API_KEY") || System.get_env("CLAUDE_API_KEY") + + case key do nil -> - IO.puts("\n ERROR: CLAUDE_API_KEY not set. Export it before running.\n") + IO.puts("\n ERROR: ANTHROPIC_API_KEY not set. Export it before running.\n") System.halt(1) key -> diff --git a/lib/rlm/config.ex b/lib/rlm/config.ex index 1e2004a..95bc459 100644 --- a/lib/rlm/config.ex +++ b/lib/rlm/config.ex @@ -9,32 +9,31 @@ defmodule RLM.Config do %RLM.Config{ models: %{ - large: "anthropic:claude-sonnet-4-6", - small: "anthropic:claude-haiku-4-5" + large: "claude-sonnet-4-6", + small: "claude-haiku-4-5" } } Model specs follow the `req_llm` naming convention: `"provider:model-name"`. For backward compatibility, bare model names without a provider prefix - are treated as Anthropic models. + are treated as Anthropic models by `RLM.LLM.ReqLLM` (which prepends + `"anthropic:"` automatically). ## Supported Providers - Any provider supported by `req_llm`: Anthropic, OpenAI, Ollama (via vLLM), + Any provider supported by `req_llm`: Anthropic, OpenAI, Ollama (local models), Google Gemini, Groq, and more. For local Ollama: RLM.run("data", "query", models: %{large: "ollama:qwen3.5:35b", small: "ollama:qwen3.5:9b"}) """ - require Logger - @default_context_window 128_000 defstruct [ :api_base_url, :api_key, - # Legacy model fields — prefer `models` map + # Used to build default `models` map; prefer passing `models` directly :model_large, :model_small, :models, @@ -127,10 +126,20 @@ defmodule RLM.Config do 1. Legacy `context_window_tokens_large/small` fields (for `:large`/`:small` keys) 2. Default of #{@default_context_window} tokens for unknown models """ + require Logger + @spec context_window_for(t(), atom()) :: non_neg_integer() def context_window_for(%__MODULE__{} = config, :large), do: config.context_window_tokens_large def context_window_for(%__MODULE__{} = config, :small), do: config.context_window_tokens_small - def context_window_for(%__MODULE__{}, _key), do: @default_context_window + + def context_window_for(%__MODULE__{}, key) do + Logger.warning( + "No context window configured for model key #{inspect(key)}, " <> + "using default of #{@default_context_window} tokens" + ) + + @default_context_window + end defp resolve_api_key do case System.get_env("ANTHROPIC_API_KEY") do diff --git a/lib/rlm/llm.ex b/lib/rlm/llm.ex index 0b6e920..82e599d 100644 --- a/lib/rlm/llm.ex +++ b/lib/rlm/llm.ex @@ -29,7 +29,7 @@ defmodule RLM.LLM do ## Arguments * `messages` — list of message maps with `:role` and `:content` fields - * `model` — provider-prefixed model spec (e.g., `"anthropic:claude-sonnet-4-6"`) + * `model` — model spec, optionally provider-prefixed (e.g., `"anthropic:claude-sonnet-4-6"` or bare `"claude-sonnet-4-6"`). Implementations should handle both formats. * `config` — `RLM.Config.t()` struct * `opts` — keyword list; supports `:schema` for structured output diff --git a/lib/rlm/llm/anthropic.ex b/lib/rlm/llm/anthropic.ex index 9c51d6d..388ce00 100644 --- a/lib/rlm/llm/anthropic.ex +++ b/lib/rlm/llm/anthropic.ex @@ -2,13 +2,13 @@ defmodule RLM.LLM.Anthropic do @moduledoc """ Hand-rolled Anthropic Messages API client. - Preserved as a fallback for users who need direct control over - Anthropic-specific features (prompt caching, etc.). The default - backend is `RLM.LLM.ReqLLM`. + Preserved for users who prefer a dependency-free Anthropic-only client + or need to customize Anthropic API request bodies directly (e.g., custom + headers, non-standard API versions). The default backend is `RLM.LLM.ReqLLM`. - Select this via config: + Select this at call time: - RLM.Config.load(llm_module: RLM.LLM.Anthropic) + RLM.run(context, query, llm_module: RLM.LLM.Anthropic) """ @behaviour RLM.LLM diff --git a/lib/rlm/llm/req_llm.ex b/lib/rlm/llm/req_llm.ex index ab372ed..1269b91 100644 --- a/lib/rlm/llm/req_llm.ex +++ b/lib/rlm/llm/req_llm.ex @@ -3,7 +3,7 @@ defmodule RLM.LLM.ReqLLM do Multi-provider LLM backend using the `req_llm` package. Supports any provider that `req_llm` supports: Anthropic, OpenAI, - Ollama (via vLLM), Google Gemini, Groq, and more. Model specs follow + Ollama (local models), Google Gemini, Groq, and more. Model specs follow the `"provider:model-name"` convention. For backward compatibility, bare model names without a provider prefix @@ -28,9 +28,14 @@ defmodule RLM.LLM.ReqLLM do case ReqLLM.generate_object(model_spec, context, schema, req_opts) do {:ok, response} -> - content = encode_object(response) - usage = extract_usage(response) - {:ok, content, usage} + case encode_object(response) do + {:error, :no_content} -> + {:error, "LLM response contained no usable content (no structured object or text)"} + + content when is_binary(content) -> + usage = extract_usage(response) + {:ok, content, usage} + end {:error, reason} -> {:error, format_error(reason)} @@ -109,15 +114,21 @@ defmodule RLM.LLM.ReqLLM do # the existing chat/4 contract (Worker expects a JSON string). defp encode_object(response) do case ReqLLM.Response.object(response) do - obj when is_map(obj) -> Jason.encode!(obj) - _ -> ReqLLM.Response.text(response) || "" + obj when is_map(obj) -> + Jason.encode!(obj) + + _ -> + case ReqLLM.Response.text(response) do + text when is_binary(text) and text != "" -> text + _ -> {:error, :no_content} + end end end defp extract_usage(response) do raw = ReqLLM.Response.usage(response) || %{} - %{ + usage = %{ prompt_tokens: Map.get(raw, :input_tokens), completion_tokens: Map.get(raw, :output_tokens), total_tokens: Map.get(raw, :total_tokens), @@ -128,6 +139,16 @@ defmodule RLM.LLM.ReqLLM do cache_read_input_tokens: Map.get(raw, :cache_read_input_tokens) || Map.get(raw, :cached_tokens) } + + if raw != %{} and is_nil(usage.prompt_tokens) and is_nil(usage.completion_tokens) do + require Logger + + Logger.warning( + "Could not extract token usage from LLM response. Raw usage keys: #{inspect(Map.keys(raw))}" + ) + end + + usage end defp format_error(%{__exception__: true} = error), do: Exception.message(error) diff --git a/lib/rlm/replay.ex b/lib/rlm/replay.ex index 053453e..69f6130 100644 --- a/lib/rlm/replay.ex +++ b/lib/rlm/replay.ex @@ -15,7 +15,7 @@ defmodule RLM.Replay do - `:live` — switch to live LLM calls for remaining iterations * `:config` — config overrides applied to the replay run. When using `fallback: :live`, set `llm_module` here to control which module - handles the live calls (defaults to `RLM.LLM`). + handles the live calls (defaults to `RLM.LLM.ReqLLM`). """ @spec replay(String.t(), keyword()) :: {:ok, any(), String.t()} | {:error, any()} diff --git a/lib/rlm/replay/fallback_llm.ex b/lib/rlm/replay/fallback_llm.ex index aa0f80b..664763c 100644 --- a/lib/rlm/replay/fallback_llm.ex +++ b/lib/rlm/replay/fallback_llm.ex @@ -9,11 +9,18 @@ defmodule RLM.Replay.FallbackLLM do @behaviour RLM.LLM + require Logger + @impl true def chat(messages, model, config, opts \\ []) do case pop_entry() do nil -> fallback_module = Process.get(:rlm_replay_fallback_module, RLM.LLM.ReqLLM) + + Logger.info( + "Replay tape exhausted, falling back to live LLM (#{inspect(fallback_module)})" + ) + fallback_module.chat(messages, model, config, opts) entry -> diff --git a/lib/rlm/replay/tape.ex b/lib/rlm/replay/tape.ex index 7e21333..53dba28 100644 --- a/lib/rlm/replay/tape.ex +++ b/lib/rlm/replay/tape.ex @@ -52,14 +52,26 @@ defmodule RLM.Replay.Tape do end end - # EventLog.get_events/1 raises an exit when no Agent exists for the run_id. - # Catch that and fall back to TraceStore. + # EventLog.get_events/1 raises an exit when no Agent exists for the run_id + # (e.g., swept by the GC). Catch that and fall back to TraceStore. defp get_events(run_id) do case RLM.EventLog.get_events(run_id) do [] -> RLM.EventLog.get_events_from_store(run_id) events -> events end catch - :exit, _ -> RLM.EventLog.get_events_from_store(run_id) + :exit, {:noproc, _} -> + # Agent was swept — expected, fall back to persisted store + RLM.EventLog.get_events_from_store(run_id) + + :exit, reason -> + require Logger + + Logger.warning( + "EventLog.get_events failed for run #{run_id}: #{inspect(reason)}, " <> + "falling back to TraceStore" + ) + + RLM.EventLog.get_events_from_store(run_id) end end diff --git a/lib/rlm/worker.ex b/lib/rlm/worker.ex index fb90b2f..7c16214 100644 --- a/lib/rlm/worker.ex +++ b/lib/rlm/worker.ex @@ -16,7 +16,7 @@ defmodule RLM.Worker do ## Structured Output LLM responses are JSON objects with `reasoning` and `code` fields, - constrained via Claude's `output_config` JSON schema. Feedback messages + constrained via a JSON schema (provider-specific structured output). Feedback messages after eval are also structured JSON. """ use GenServer, restart: :temporary diff --git a/test/rlm/config_test.exs b/test/rlm/config_test.exs new file mode 100644 index 0000000..196598d --- /dev/null +++ b/test/rlm/config_test.exs @@ -0,0 +1,125 @@ +defmodule RLM.ConfigTest do + use ExUnit.Case, async: true + + alias RLM.Config + + describe "load/1" do + test "returns a Config struct with expected defaults" do + config = Config.load() + + assert %Config{} = config + assert config.llm_module == RLM.LLM.ReqLLM + assert config.max_iterations == 25 + assert config.max_depth == 5 + end + + test "builds default models map from model_large/model_small" do + config = Config.load() + + assert config.models == %{ + large: "claude-sonnet-4-6", + small: "claude-haiku-4-5" + } + end + + test "overrides models map when provided" do + custom_models = %{large: "ollama:llama3", small: "ollama:llama3:8b"} + config = Config.load(models: custom_models) + + assert config.models == custom_models + end + + test "legacy model_large/model_small flow into default models map" do + config = Config.load(model_large: "custom-large", model_small: "custom-small") + + assert config.models == %{large: "custom-large", small: "custom-small"} + assert config.model_large == "custom-large" + assert config.model_small == "custom-small" + end + + test "explicit models override takes precedence over legacy fields" do + config = + Config.load( + model_large: "ignored-large", + models: %{large: "winner-large", small: "winner-small"} + ) + + assert config.models == %{large: "winner-large", small: "winner-small"} + end + + test "llm_module defaults to RLM.LLM.ReqLLM" do + config = Config.load() + assert config.llm_module == RLM.LLM.ReqLLM + end + + test "llm_module can be overridden" do + config = Config.load(llm_module: RLM.LLM.Anthropic) + assert config.llm_module == RLM.LLM.Anthropic + end + end + + describe "resolve_model/2" do + test "returns {:ok, spec} for a valid key" do + config = Config.load(models: %{large: "anthropic:claude-sonnet-4-6"}) + + assert {:ok, "anthropic:claude-sonnet-4-6"} = Config.resolve_model(config, :large) + end + + test "returns {:ok, spec} for bare model name" do + config = Config.load() + + assert {:ok, "claude-sonnet-4-6"} = Config.resolve_model(config, :large) + assert {:ok, "claude-haiku-4-5"} = Config.resolve_model(config, :small) + end + + test "returns {:error, _} for unknown key" do + config = Config.load() + + assert {:error, message} = Config.resolve_model(config, :unknown) + assert message =~ "Unknown model key: unknown" + end + + test "returns {:error, _} for non-string value in models map" do + config = Config.load(models: %{large: 42, small: "valid"}) + + assert {:error, message} = Config.resolve_model(config, :large) + assert message =~ "invalid spec" + assert message =~ "42" + end + + test "works with custom model keys" do + config = Config.load(models: %{large: "a", small: "b", medium: "ollama:qwen3:14b"}) + + assert {:ok, "ollama:qwen3:14b"} = Config.resolve_model(config, :medium) + end + end + + describe "context_window_for/2" do + test "returns context_window_tokens_large for :large" do + config = Config.load(context_window_tokens_large: 200_000) + + assert Config.context_window_for(config, :large) == 200_000 + end + + test "returns context_window_tokens_small for :small" do + config = Config.load(context_window_tokens_small: 100_000) + + assert Config.context_window_for(config, :small) == 100_000 + end + + test "returns default 128_000 for unknown model keys" do + config = Config.load() + + import ExUnit.CaptureLog + log = capture_log(fn -> assert Config.context_window_for(config, :medium) == 128_000 end) + assert log =~ "No context window configured for model key :medium" + end + end + + describe "resolve_api_key (via load)" do + test "api_key can be explicitly set" do + config = Config.load(api_key: "test-key-123") + assert config.api_key == "test-key-123" + end + end +end