temporal-bench

An automated benchmark for evaluating how effectively foundational LLMs can integrate Temporal into existing codebases.

Benchmark Score

The benchmark generates two types of scores for each model:

Language Score: For each programming language, individual test case scores (0-2) are summed and normalized to a 0-100 scale
Aggregate Score: The total score across all languages, also normalized to 0-100

Important: Scores are a function of the specific test set used. As the test set expands, the benchmark will be versioned accordingly. Scores across different benchmark versions are not comparable since the underlying tests have changed. However, within any given benchmark version, scores provide a reliable basis for comparing performance across different models.

Setup

Python Environment

This project uses a Python virtual environment to manage dependencies. Follow these steps to set up your development environment:

Create and activate the virtual environment

# Create a virtual environment named 'temporal-bench'
python -m venv temporal-bench

# Activate the virtual environment
# On macOS/Linux:
source temporal-bench/bin/activate

Deactivate the virtual environment

When you're done working on the project, deactivate the virtual environment:

deactivate

Python Dependencies

Creating requirements.txt

To capture your project's dependencies in a requirements file:

# After installing packages with pip, generate requirements.txt
pip freeze > requirements.txt

Installing from requirements.txt

To install all required packages from the requirements file:

# Make sure your virtual environment is activated first
pip install -r requirements.txt

Adding new dependencies

When adding new Python packages:

# Install the package
pip install package-name

# Update requirements.txt
pip freeze > requirements.txt

Project Documentation

For detailed information about the benchmark strategy, architecture, and implementation, see the Product Requirements Document.

Quick Start

Clone the repository
Create and activate the Python virtual environment (see above)
Install dependencies: pip install -r requirements.txt
Set up your .env file with API keys for the LLM services
Run the benchmark: python main.py

Results will be generated in the results/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
prd.md		prd.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

temporal-bench

Benchmark Score

Setup

Python Environment

Create and activate the virtual environment

Deactivate the virtual environment

Python Dependencies

Creating requirements.txt

Installing from requirements.txt

Adding new dependencies

Project Documentation

Quick Start

About

Uh oh!

Releases

Packages

ethanruhe/temporal-bench

Folders and files

Latest commit

History

Repository files navigation

temporal-bench

Benchmark Score

Setup

Python Environment

Create and activate the virtual environment

Deactivate the virtual environment

Python Dependencies

Creating requirements.txt

Installing from requirements.txt

Adding new dependencies

Project Documentation

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages