You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+38-57Lines changed: 38 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# ai-sdk-bench
2
2
3
-
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
3
+
AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration using the Vercel AI Gateway. Automatically discovers and runs all tests in the `tests/` directory, verifying LLM-generated Svelte components against test suites.
4
4
5
5
## Installation
6
6
@@ -12,84 +12,60 @@ bun install
12
12
13
13
## Setup
14
14
15
-
To set up`.env`:
15
+
Configure your API keys in`.env`:
16
16
17
17
```bash
18
18
cp .env.example .env
19
19
```
20
20
21
-
Then configure your API keys and model in `.env`:
21
+
Then add the necessary API key use the `vercel env pull`
22
22
23
-
```bash
24
-
# Required: Choose your model
25
-
MODEL=anthropic/claude-sonnet-4
26
-
ANTHROPIC_API_KEY=your_key_here
27
-
28
-
# Optional: Enable MCP integration (leave empty to disable)
29
-
MCP_SERVER_URL=https://mcp.svelte.dev/mcp
30
-
```
31
-
32
-
### Environment Variables
33
-
34
-
**Required:**
35
-
36
-
-`MODEL`: The AI model to use (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-5`, `openrouter/anthropic/claude-sonnet-4`, `lmstudio/model-name`)
37
-
- Corresponding API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `OPENROUTER_API_KEY`)
38
-
- Note: No API key required for `lmstudio/*` models (runs locally)
39
-
40
-
**Optional:**
41
-
42
-
-`MCP_SERVER_URL`: MCP server URL (leave empty to disable MCP integration)
43
-
44
-
### Supported Providers
45
-
46
-
**Cloud Providers:**
23
+
### Required API Keys
47
24
48
-
-`anthropic/*` - Direct Anthropic API (requires `ANTHROPIC_API_KEY`)
49
-
-`openai/*` - Direct OpenAI API (requires `OPENAI_API_KEY`)
50
-
-`openrouter/*` - OpenRouter unified API (requires `OPENROUTER_API_KEY`)
25
+
You'll need at least one API key for the providers you want to test:
51
26
52
-
**Local Providers:**
27
+
-`VERCEL_OIDC_TOKEN`: The OIDC token for vercel AI gateway
53
28
54
-
-`lmstudio/*` - LM Studio local server (requires LM Studio running on `http://localhost:1234`)
29
+
## Usage
55
30
56
-
Example configurations:
31
+
To run the benchmark:
57
32
58
33
```bash
59
-
# Anthropic
60
-
MODEL=anthropic/claude-sonnet-4
61
-
ANTHROPIC_API_KEY=sk-ant-...
34
+
bun run index.ts
35
+
```
62
36
63
-
# OpenAI
64
-
MODEL=openai/gpt-5
65
-
OPENAI_API_KEY=sk-...
37
+
### Interactive CLI
66
38
67
-
# OpenRouter
68
-
MODEL=openrouter/anthropic/claude-sonnet-4
69
-
OPENROUTER_API_KEY=sk-or-...
39
+
The benchmark features an interactive CLI that will prompt you for configuration:
70
40
71
-
# LM Studio (local)
72
-
MODEL=lmstudio/llama-3-8b
73
-
# No API key needed - make sure LM Studio is running!
74
-
```
41
+
1.**Model Selection**: Choose one or more models from the Vercel AI Gateway
42
+
- Select from available models in your configured providers
43
+
- Optionally add custom model IDs
44
+
- Can test multiple models in a single run
75
45
76
-
## Usage
46
+
2.**MCP Integration**: Choose your MCP configuration
47
+
-**No MCP Integration**: Run without external tools
48
+
-**MCP over HTTP**: Use HTTP-based MCP server (default: `https://mcp.svelte.dev/mcp`)
49
+
-**MCP over StdIO**: Use local MCP server via command (default: `npx -y @sveltejs/mcp`)
50
+
- Option to provide custom MCP server URL or command
77
51
78
-
To run the benchmark (automatically discovers and runs all tests):
52
+
3.**TestComponent Tool**: Enable/disable the testing tool for models
53
+
- Allows models to run tests during component development
54
+
- Enabled by default
79
55
80
-
```bash
81
-
bun run index.ts
82
-
```
56
+
### Benchmark Workflow
83
57
84
-
The benchmark will:
58
+
After configuration, the benchmark will:
85
59
86
60
1. Discover all tests in `tests/` directory
87
-
2. For each test:
61
+
2. For each selected model and test:
88
62
- Run the AI agent with the test's prompt
89
63
- Extract the generated Svelte component
90
64
- Verify the component against the test suite
91
65
3. Generate a combined report with all results
92
66
67
+
### Results and Reports
68
+
93
69
Results are saved to the `results/` directory with timestamped filenames:
94
70
95
71
-`results/result-2024-12-07-14-30-45.json` - Full execution trace with all test results
@@ -148,12 +124,17 @@ This copies each `Reference.svelte` to `Component.svelte` temporarily and runs t
148
124
149
125
## MCP Integration
150
126
151
-
The tool supports optional integration with MCP (Model Context Protocol) servers:
127
+
The tool supports optional integration with MCP (Model Context Protocol) servers through the interactive CLI. When running the benchmark, you'll be prompted to choose:
152
128
153
-
-**Enabled**: Set `MCP_SERVER_URL` to a valid MCP server URL
154
-
-**Disabled**: Leave `MCP_SERVER_URL` empty or unset
129
+
-**No MCP Integration**: Run without external tools
130
+
-**MCP over HTTP**: Connect to an HTTP-based MCP server
131
+
- Default: `https://mcp.svelte.dev/mcp`
132
+
- Option to provide a custom URL
133
+
-**MCP over StdIO**: Connect to a local MCP server via command
134
+
- Default: `npx -y @sveltejs/mcp`
135
+
- Option to provide a custom command
155
136
156
-
MCP status is documented in both the JSON metadata and displayed as a badge in the HTML report.
137
+
MCP status, transport type, and server configuration are documented in both the JSON metadata and displayed as a badge in the HTML report.
0 commit comments