You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+39-32Lines changed: 39 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,21 +5,21 @@ Models (LLMs).
5
5
6
6
You can use this tool to make evidence-based decisions relating to AI-generated code. For example:
7
7
8
-
* 🔄 Iterate on a system prompt to find most effective instructions for your project.
9
-
* ⚖️ Compare the code quality of code produced by different models.
10
-
* 📈 Monitor generated code quality over time as models and agents evolve.
8
+
- 🔄 Iterate on a system prompt to find most effective instructions for your project.
9
+
- ⚖️ Compare the code quality of code produced by different models.
10
+
- 📈 Monitor generated code quality over time as models and agents evolve.
11
11
12
12
Web Codegen Scorer is different from other code benchmarks in that it focuses specifically on _web_
13
13
code and relies primarily on well-established measures of code quality.
14
14
15
15
## Features
16
16
17
-
* ⚙️ Configure your evaluations with different models, frameworks, and tools.
18
-
* ✍️ Specify system instructions and add MCP servers.
19
-
* 📋 Use built-in checks for build success, runtime errors, accessibility, security, LLM rating, and
17
+
- ⚙️ Configure your evaluations with different models, frameworks, and tools.
18
+
- ✍️ Specify system instructions and add MCP servers.
19
+
- 📋 Use built-in checks for build success, runtime errors, accessibility, security, LLM rating, and
20
20
coding best practices. (More built-in checks coming soon!)
21
-
* 🔧 Automatically attempt to repair issues detected during code generating.
22
-
* 📊 View and compare results with an intuitive report viewer UI.
21
+
- 🔧 Automatically attempt to repair issues detected during code generating.
22
+
- 📊 View and compare results with an intuitive report viewer UI.
23
23
24
24
## Setup
25
25
@@ -40,6 +40,13 @@ export OPENAI_API_KEY="YOUR_API_KEY_HERE" # If you're using OpenAI models
40
40
export ANTHROPIC_API_KEY="YOUR_API_KEY_HERE"# If you're using Anthropic models
41
41
```
42
42
43
+
> [!NOTE]
44
+
> Web Codegen Scorer supports locals models via Ollama as well. In order to use them, you must have a running Ollama server with the respective model(s) installed. By default, the tool is listening on port `11434` for the server. However, you can change that port by setting the `OLLAMA_PORT` environment variable.
45
+
>
46
+
> Be aware that using local models might sometimes lead to execution errors due to the output not conforming to our desired format. Unfortunately, this is a present-day limitation of these models. That being said, you can treat the feature as experimental.
47
+
>
48
+
> Currently supported models: `gemma3:4b`, `gemma3:12b`, `codegemma:7b`
49
+
43
50
3.**Run an eval:**
44
51
45
52
You can run your first eval using our Angular example with the following command:
@@ -63,11 +70,11 @@ You can customize the `web-codegen-scorer eval` script with the following flags:
63
70
64
71
-`--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the
0 commit comments