Task template and training materials for TerminalBench Expert Coders (ECs)—the official training portal by Snorkel AI for contributing real-world software engineering tasks used to evaluate AI coding agents.
This repo is built from the template-task skeleton provided by the Terminus EC Training portal. Use it as a starter when creating or working on TerminalBench tasks.
TerminalBench is a benchmark that tests AI coding agents on tasks that mirror real software engineering work: debugging race conditions, implementing API endpoints, refactoring legacy code, fixing security vulnerabilities, and more. Expert Coders create these tasks; the Terminus portal is the home base for contributing.
| Path | Purpose |
|---|---|
task.toml |
Task metadata (difficulty, category, timeouts, resources) |
instruction.md |
Human-readable task instructions |
environment/Dockerfile |
Runtime environment for the task |
tests/ |
Verifier: test.sh (runs tests, writes reward) and test_outputs.py |
solution/ |
Oracle/reference solution (e.g. solve.sh) |
- Verifier: Runs in Docker; uses
uv+ pytest; writes/logs/verifier/reward.txt(1 = pass, 0 = fail). - Agent: Gets the same environment and must satisfy the same verifier.
-
Clone and customize
Replaceinstruction.mdwith your task, implementsolution/, and adjusttask.tomlandenvironment/Dockerfileas needed. -
Run tests locally
Build and run the environment (see portal docs for full workflow). The verifier runstests/test.sh, which executestests/test_outputs.pyand produces the reward file. -
Submit via the portal
Sign in at Terminus EC Training, claim or create a task, and follow the submit workflow.
- Portal (login required): Terminus EC Training — onboarding, task lifecycle, docs, and submission.
- This repo:
docs/contains onboarding and task-creation notes captured from the public portal and Terminus materials.
Template structure aligns with Snorkel AI’s Terminus EC Training program. See portal and Snorkel AI for terms.