Skip to content

Commit 315e2c5

Browse files
authored
FIX: Run AI evals with bundle exec ruby (#86)
1 parent fc26f56 commit 315e2c5

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

docs/06-general-guides/04-ai-evals.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,27 +25,27 @@ The Discourse AI plugin ships a Ruby CLI under `plugins/discourse-ai/evals` that
2525
- `OPENAI_API_KEY=...`
2626
- `ANTHROPIC_API_KEY=...`
2727
- `GEMINI_API_KEY=...`
28-
- From the repository root, change into `plugins/discourse-ai/evals` and run `./run --help` to confirm the CLI is wired up. If `evals/cases` is missing it will be cloned automatically from `discourse/discourse-ai-evals`.
28+
- From the repository root, change into `plugins/discourse-ai/evals` and run `bundle exec ruby ./run --help` to confirm the CLI is wired up. If `evals/cases` is missing it will be cloned automatically from `discourse/discourse-ai-evals`.
2929

3030
## Discover available inputs
3131

32-
- `./run --list` lists all eval ids from `evals/cases/*/*.yml`.
33-
- `./run --list-features` prints feature keys grouped by module (format: `module:feature`).
34-
- `./run --list-models` shows LLM configs that can be hydrated from `eval-llms.yml`/`.local.yml`.
35-
- `./run --list-personas` lists persona keys defined under `evals/personas/*.yml` plus the built-in `default`.
32+
- `bundle exec ruby ./run --list` lists all eval ids from `evals/cases/*/*.yml`.
33+
- `bundle exec ruby ./run --list-features` prints feature keys grouped by module (format: `module:feature`).
34+
- `bundle exec ruby ./run --list-models` shows LLM configs that can be hydrated from `eval-llms.yml`/`.local.yml`.
35+
- `bundle exec ruby ./run --list-personas` lists persona keys defined under `evals/personas/*.yml` plus the built-in `default`.
3636

3737
## Run evals
3838

3939
- Run a single eval against specific models:
4040

4141
```sh
42-
OPENAI_API_KEY=... ./run --eval simple_summarization --models gpt-4o-mini
42+
OPENAI_API_KEY=... bundle exec ruby ./run --eval simple_summarization --models gpt-4o-mini
4343
```
4444

4545
- Run every eval for a feature (or the whole suite) against multiple models:
4646

4747
```sh
48-
./run --feature summarization:topic_summaries --models gpt-4o-mini,claude-3-5-sonnet-latest
48+
bundle exec ruby ./run --feature summarization:topic_summaries --models gpt-4o-mini,claude-3-5-sonnet-latest
4949
```
5050

5151
Omitting `--models` hydrates every configured LLM. Models that cannot hydrate (missing API keys, etc.) are skipped with a log message.
@@ -82,7 +82,7 @@ The Discourse AI plugin ships a Ruby CLI under `plugins/discourse-ai/evals` that
8282
- Example:
8383

8484
```sh
85-
./run --dataset evals/cases/spam/spam_eval_dataset.csv --feature spam:inspect_posts --models gpt-4o-mini
85+
bundle exec ruby ./run --dataset evals/cases/spam/spam_eval_dataset.csv --feature spam:inspect_posts --models gpt-4o-mini
8686
```
8787

8888
## Writing eval cases

0 commit comments

Comments
 (0)