A framework for testing and comparing code generation capabilities across various AI models.
This tool allows you to:
- Send the same coding prompt to multiple AI models simultaneously
- Generate HTML/JS/CSS implementations from each model
- Compare the results side-by-side
- Deploy the results to Netlify for easy sharing and viewing
- Node.js
- NPM
- One of the following:
- Mods CLI for the default script (
mods
command must be available) - API keys for various models if using the direct API method (see
.env.example
)
- Mods CLI for the default script (
- Netlify account (optional, for deployment)
- Clone this repository
- Install dependencies:
npm install
Run the test script with default settings using either:
# Using mods CLI
./run_tests.sh
# Using direct API calls
./run_tests_curl.sh
Both scripts will:
- Query all configured AI models with the default tic-tac-toe prompt
- Generate HTML/JS/CSS implementations for each model
- Create a timestamped results directory with all outputs
- Generate an index.html file to compare results
Note for API version: If using the direct API version (run_tests_curl.sh
), you need to set up the API keys:
- Copy
.env.example
to.env
- Add your API keys to the
.env
file
You can customize the tests with various options:
Usage: ./run_tests.sh [OPTIONS]
Options:
-h, --help Show this help message
-p, --prompt TEXT Custom prompt (default: tic-tac-toe game prompt)
-m, --models MODEL1,MODEL2 Comma-separated list of models to test
-n, --name NAME Experiment name prefix (default: ai-test)
-d, --deploy Deploy results to Netlify after completion
Examples:
./run_tests.sh # Run with default settings
./run_tests.sh -p "write a calculator in HTML/JS" # Custom prompt
./run_tests.sh -m gpt-4o,sonnet-3.7 # Test specific models
./run_tests.sh -n calculator-test # Custom experiment name
./run_tests.sh -d # Deploy results to Netlify
After running tests, you'll find the results in:
results/[experiment_name]_[timestamp]/
Open the index.html
file in this directory to compare model outputs.
You can also browse all experiments by opening results/index.html
.
To deploy your results to Netlify:
./run_tests.sh -d
Or deploy existing results:
npm run deploy
Edit defaults.sh
to change the default:
- Models to test
- Prompt to use
- Experiment name prefix
run_tests.sh
- Main script to run tests using mods CLIrun_tests_curl.sh
- Alternative script using direct API calls with curldefaults.sh
- Default configuration settingsupdate_experiment_list.sh
- Updates list of experiments for the dashboardtemplates/
- HTML templates for result pagesresults/
- Contains all test results (excluded from git).env.example
- Example environment variables for API keys
This project is set up to exclude test results from git. When you push to GitHub:
- All code and templates will be included
- All test results in the
results/
directory will be excluded - Others who clone your repository can run their own tests and generate their own results
MIT