Skip to content

Commit ab0958e

Browse files
committed
feat: interactive cli + vercel gateway
1 parent 4b6417c commit ab0958e

File tree

6 files changed

+245
-258
lines changed

6 files changed

+245
-258
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,5 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
4040
results/*
4141
!results/.gitkeep
4242
!results/*.json
43+
.vercel
44+
.env*.local

README.md

Lines changed: 38 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ai-sdk-bench
22

3-
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
3+
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration using the Vercel AI Gateway. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
44

55
## Installation
66

@@ -12,84 +12,60 @@ bun install
1212

1313
## Setup
1414

15-
To set up `.env`:
15+
Configure your API keys in `.env`:
1616

1717
```bash
1818
cp .env.example .env
1919
```
2020

21-
Then configure your API keys and model in `.env`:
21+
Then add the necessary API key use the `vercel env pull`
2222

23-
```bash
24-
# Required: Choose your model
25-
MODEL=anthropic/claude-sonnet-4
26-
ANTHROPIC_API_KEY=your_key_here
27-
28-
# Optional: Enable MCP integration (leave empty to disable)
29-
MCP_SERVER_URL=https://mcp.svelte.dev/mcp
30-
```
31-
32-
### Environment Variables
33-
34-
**Required:**
35-
36-
- `MODEL`: The AI model to use (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-5`, `openrouter/anthropic/claude-sonnet-4`, `lmstudio/model-name`)
37-
- Corresponding API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `OPENROUTER_API_KEY`)
38-
- Note: No API key required for `lmstudio/*` models (runs locally)
39-
40-
**Optional:**
41-
42-
- `MCP_SERVER_URL`: MCP server URL (leave empty to disable MCP integration)
43-
44-
### Supported Providers
45-
46-
**Cloud Providers:**
23+
### Required API Keys
4724

48-
- `anthropic/*` - Direct Anthropic API (requires `ANTHROPIC_API_KEY`)
49-
- `openai/*` - Direct OpenAI API (requires `OPENAI_API_KEY`)
50-
- `openrouter/*` - OpenRouter unified API (requires `OPENROUTER_API_KEY`)
25+
You'll need at least one API key for the providers you want to test:
5126

52-
**Local Providers:**
27+
- `VERCEL_OIDC_TOKEN`: The OIDC token for vercel AI gateway
5328

54-
- `lmstudio/*` - LM Studio local server (requires LM Studio running on `http://localhost:1234`)
29+
## Usage
5530

56-
Example configurations:
31+
To run the benchmark:
5732

5833
```bash
59-
# Anthropic
60-
MODEL=anthropic/claude-sonnet-4
61-
ANTHROPIC_API_KEY=sk-ant-...
34+
bun run index.ts
35+
```
6236

63-
# OpenAI
64-
MODEL=openai/gpt-5
65-
OPENAI_API_KEY=sk-...
37+
### Interactive CLI
6638

67-
# OpenRouter
68-
MODEL=openrouter/anthropic/claude-sonnet-4
69-
OPENROUTER_API_KEY=sk-or-...
39+
The benchmark features an interactive CLI that will prompt you for configuration:
7040

71-
# LM Studio (local)
72-
MODEL=lmstudio/llama-3-8b
73-
# No API key needed - make sure LM Studio is running!
74-
```
41+
1. **Model Selection**: Choose one or more models from the Vercel AI Gateway
42+
- Select from available models in your configured providers
43+
- Optionally add custom model IDs
44+
- Can test multiple models in a single run
7545

76-
## Usage
46+
2. **MCP Integration**: Choose your MCP configuration
47+
- **No MCP Integration**: Run without external tools
48+
- **MCP over HTTP**: Use HTTP-based MCP server (default: `https://mcp.svelte.dev/mcp`)
49+
- **MCP over StdIO**: Use local MCP server via command (default: `npx -y @sveltejs/mcp`)
50+
- Option to provide custom MCP server URL or command
7751

78-
To run the benchmark (automatically discovers and runs all tests):
52+
3. **TestComponent Tool**: Enable/disable the testing tool for models
53+
- Allows models to run tests during component development
54+
- Enabled by default
7955

80-
```bash
81-
bun run index.ts
82-
```
56+
### Benchmark Workflow
8357

84-
The benchmark will:
58+
After configuration, the benchmark will:
8559

8660
1. Discover all tests in `tests/` directory
87-
2. For each test:
61+
2. For each selected model and test:
8862
- Run the AI agent with the test's prompt
8963
- Extract the generated Svelte component
9064
- Verify the component against the test suite
9165
3. Generate a combined report with all results
9266

67+
### Results and Reports
68+
9369
Results are saved to the `results/` directory with timestamped filenames:
9470

9571
- `results/result-2024-12-07-14-30-45.json` - Full execution trace with all test results
@@ -148,12 +124,17 @@ This copies each `Reference.svelte` to `Component.svelte` temporarily and runs t
148124

149125
## MCP Integration
150126

151-
The tool supports optional integration with MCP (Model Context Protocol) servers:
127+
The tool supports optional integration with MCP (Model Context Protocol) servers through the interactive CLI. When running the benchmark, you'll be prompted to choose:
152128

153-
- **Enabled**: Set `MCP_SERVER_URL` to a valid MCP server URL
154-
- **Disabled**: Leave `MCP_SERVER_URL` empty or unset
129+
- **No MCP Integration**: Run without external tools
130+
- **MCP over HTTP**: Connect to an HTTP-based MCP server
131+
- Default: `https://mcp.svelte.dev/mcp`
132+
- Option to provide a custom URL
133+
- **MCP over StdIO**: Connect to a local MCP server via command
134+
- Default: `npx -y @sveltejs/mcp`
135+
- Option to provide a custom command
155136

156-
MCP status is documented in both the JSON metadata and displayed as a badge in the HTML report.
137+
MCP status, transport type, and server configuration are documented in both the JSON metadata and displayed as a badge in the HTML report.
157138

158139
## Exit Codes
159140

bun.lock

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
"@ai-sdk/mcp": "0.0.11",
1010
"@ai-sdk/openai": "^2.0.77",
1111
"@ai-sdk/openai-compatible": "^1.0.28",
12+
"@clack/prompts": "^0.11.0",
1213
"@openrouter/ai-sdk-provider": "^1.4.1",
1314
"@testing-library/svelte": "^5.2.9",
1415
"@testing-library/user-event": "^14.6.1",
@@ -54,6 +55,10 @@
5455

5556
"@babel/runtime": ["@babel/runtime@7.28.4", "", {}, "sha512-Q/N6JNWvIvPnLDvjlE1OUBLPQHH6l3CltCEsHIujp45zQUSSh8K+gHnaEX45yAT1nyngnINhvWtzN+Nb9D8RAQ=="],
5657

58+
"@clack/core": ["@clack/core@0.5.0", "", { "dependencies": { "picocolors": "^1.0.0", "sisteransi": "^1.0.5" } }, "sha512-p3y0FIOwaYRUPRcMO7+dlmLh8PSRcrjuTndsiA0WAFbWES0mLZlrjVoBRZ9DzkPFJZG6KGkJmoEAY0ZcVWTkow=="],
59+
60+
"@clack/prompts": ["@clack/prompts@0.11.0", "", { "dependencies": { "@clack/core": "0.5.0", "picocolors": "^1.0.0", "sisteransi": "^1.0.5" } }, "sha512-pMN5FcrEw9hUkZA4f+zLlzivQSeQf5dRGJjSUbvVYDLvpKCdQx5OaknvKzgbtXOizhP+SJJJjqEbOe55uKKfAw=="],
61+
5762
"@csstools/color-helpers": ["@csstools/color-helpers@5.1.0", "", {}, "sha512-S11EXWJyy0Mz5SYvRmY8nJYTFFd1LCNV+7cXyAgQtOOuzb4EsgfqDufL+9esx72/eLhsRdGZwaldu/h+E4t4BA=="],
5863

5964
"@csstools/css-calc": ["@csstools/css-calc@2.1.4", "", { "peerDependencies": { "@csstools/css-parser-algorithms": "^3.0.5", "@csstools/css-tokenizer": "^3.0.4" } }, "sha512-3N8oaj+0juUw/1H3YwmDDJXCgTB1gKU6Hc/bB502u9zR0q2vd786XJH9QfrKIEgFlZmhZiq6epXl4rHqhzsIgQ=="],
@@ -446,6 +451,8 @@
446451

447452
"siginfo": ["siginfo@2.0.0", "", {}, "sha512-ybx0WO1/8bSBLEWXZvEd7gMW3Sn3JFlW3TvX1nREbDLRNQNaeNN8WK0meBwPdAaOI7TtRRRJn/Es1zhrrCHu7g=="],
448453

454+
"sisteransi": ["sisteransi@1.0.5", "", {}, "sha512-bLGGlR1QxBcynn2d5YmDX4MGjlZvy2MRBDRNHLJ8VI6l6+9FUiyTFNJ0IveOSP0bcXgVDPRcfGqA0pjaqUpfVg=="],
455+
449456
"slash": ["slash@2.0.0", "", {}, "sha512-ZYKh3Wh2z1PpEXWr0MpSBZ0V6mZHAQfYevttO11c51CaWjGTaadiKZ+wVt1PbMlDV5qhMFslpZCemhwOK7C89A=="],
450457

451458
"source-map-js": ["source-map-js@1.2.1", "", {}, "sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA=="],

0 commit comments

Comments
 (0)