AI Coding Tests

A framework for testing and comparing code generation capabilities across various AI models.

Overview

This tool allows you to:

Send the same coding prompt to multiple AI models simultaneously
Generate HTML/JS/CSS implementations from each model
Compare the results side-by-side
Deploy the results to Netlify for easy sharing and viewing

Requirements

Node.js
NPM
One of the following:
- Mods CLI for the default script (mods command must be available)
- API keys for various models if using the direct API method (see .env.example)
Netlify account (optional, for deployment)

Installation

Clone this repository
Install dependencies:
```
npm install
```

Usage

Basic Usage

Run the test script with default settings using either:

# Using mods CLI
./run_tests.sh

# Using direct API calls
./run_tests_curl.sh

Both scripts will:

Query all configured AI models with the default tic-tac-toe prompt
Generate HTML/JS/CSS implementations for each model
Create a timestamped results directory with all outputs
Generate an index.html file to compare results

Note for API version: If using the direct API version (run_tests_curl.sh), you need to set up the API keys:

Copy .env.example to .env
Add your API keys to the .env file

Custom Usage

You can customize the tests with various options:

Usage: ./run_tests.sh [OPTIONS]

Options:
  -h, --help                 Show this help message
  -p, --prompt TEXT          Custom prompt (default: tic-tac-toe game prompt)
  -m, --models MODEL1,MODEL2 Comma-separated list of models to test
  -n, --name NAME            Experiment name prefix (default: ai-test)
  -d, --deploy               Deploy results to Netlify after completion

Examples:
  ./run_tests.sh                                     # Run with default settings
  ./run_tests.sh -p "write a calculator in HTML/JS"  # Custom prompt
  ./run_tests.sh -m gpt-4o,sonnet-3.7                # Test specific models
  ./run_tests.sh -n calculator-test                  # Custom experiment name
  ./run_tests.sh -d                                  # Deploy results to Netlify

Viewing Results

After running tests, you'll find the results in:

results/[experiment_name]_[timestamp]/

Open the index.html file in this directory to compare model outputs.

You can also browse all experiments by opening results/index.html.

Deployment

To deploy your results to Netlify:

./run_tests.sh -d

Or deploy existing results:

npm run deploy

Customizing Defaults

Edit defaults.sh to change the default:

Models to test
Prompt to use
Experiment name prefix

Structure

run_tests.sh - Main script to run tests using mods CLI
run_tests_curl.sh - Alternative script using direct API calls with curl
defaults.sh - Default configuration settings
update_experiment_list.sh - Updates list of experiments for the dashboard
templates/ - HTML templates for result pages
results/ - Contains all test results (excluded from git)
.env.example - Example environment variables for API keys

Sharing on GitHub

This project is set up to exclude test results from git. When you push to GitHub:

All code and templates will be included
All test results in the results/ directory will be excluded
Others who clone your repository can run their own tests and generate their own results

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
results		results
templates		templates
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
check_environment.sh		check_environment.sh
defaults.sh		defaults.sh
netlify.toml		netlify.toml
package-lock.json		package-lock.json
package.json		package.json
run_tests.sh		run_tests.sh
run_tests_curl.sh		run_tests_curl.sh
update_experiment_list.sh		update_experiment_list.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Coding Tests

Overview

Requirements

Installation

Usage

Basic Usage

Custom Usage

Viewing Results

Deployment

Customizing Defaults

Structure

Sharing on GitHub

License

About

Uh oh!

Uh oh!

Languages

cutalion/ai-coding-tests

Folders and files

Latest commit

History

Repository files navigation

AI Coding Tests

Overview

Requirements

Installation

Usage

Basic Usage

Custom Usage

Viewing Results

Deployment

Customizing Defaults

Structure

Sharing on GitHub

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages