FIX: Run AI evals with bundle exec ruby (#86)

romanrizzi · web-flow · commit 315e2c58a5d3 · 2025-12-04T15:30:06.000-03:00
diff --git a/docs/06-general-guides/04-ai-evals.md b/docs/06-general-guides/04-ai-evals.md
@@ -25,27 +25,27 @@ The Discourse AI plugin ships a Ruby CLI under `plugins/discourse-ai/evals` that
   - `OPENAI_API_KEY=...`
   - `ANTHROPIC_API_KEY=...`
   - `GEMINI_API_KEY=...`
-- From the repository root, change into `plugins/discourse-ai/evals` and run `./run --help` to confirm the CLI is wired up. If `evals/cases` is missing it will be cloned automatically from `discourse/discourse-ai-evals`.
+- From the repository root, change into `plugins/discourse-ai/evals` and run `bundle exec ruby ./run --help` to confirm the CLI is wired up. If `evals/cases` is missing it will be cloned automatically from `discourse/discourse-ai-evals`.
 
 ## Discover available inputs
 
-- `./run --list` lists all eval ids from `evals/cases/*/*.yml`.
-- `./run --list-features` prints feature keys grouped by module (format: `module:feature`).
-- `./run --list-models` shows LLM configs that can be hydrated from `eval-llms.yml`/`.local.yml`.
-- `./run --list-personas` lists persona keys defined under `evals/personas/*.yml` plus the built-in `default`.
+- `bundle exec ruby ./run --list` lists all eval ids from `evals/cases/*/*.yml`.
+- `bundle exec ruby ./run --list-features` prints feature keys grouped by module (format: `module:feature`).
+- `bundle exec ruby ./run --list-models` shows LLM configs that can be hydrated from `eval-llms.yml`/`.local.yml`.
+- `bundle exec ruby ./run --list-personas` lists persona keys defined under `evals/personas/*.yml` plus the built-in `default`.
 
 ## Run evals
 
 - Run a single eval against specific models:
 
   ```sh
-  OPENAI_API_KEY=... ./run --eval simple_summarization --models gpt-4o-mini
+  OPENAI_API_KEY=... bundle exec ruby ./run --eval simple_summarization --models gpt-4o-mini
   ```
 
 - Run every eval for a feature (or the whole suite) against multiple models:
 
   ```sh
-  ./run --feature summarization:topic_summaries --models gpt-4o-mini,claude-3-5-sonnet-latest
+  bundle exec ruby ./run --feature summarization:topic_summaries --models gpt-4o-mini,claude-3-5-sonnet-latest
   ```
 
   Omitting `--models` hydrates every configured LLM. Models that cannot hydrate (missing API keys, etc.) are skipped with a log message.
@@ -82,7 +82,7 @@ The Discourse AI plugin ships a Ruby CLI under `plugins/discourse-ai/evals` that
 - Example:
 
   ```sh
-  ./run --dataset evals/cases/spam/spam_eval_dataset.csv --feature spam:inspect_posts --models gpt-4o-mini
+  bundle exec ruby ./run --dataset evals/cases/spam/spam_eval_dataset.csv --feature spam:inspect_posts --models gpt-4o-mini
   ```
 
 ## Writing eval cases