A research runtime for agentic reinforcement learning.
Ergon manages the full lifecycle: define benchmarks, run agent tasks in sandboxed environments, evaluate outputs with configurable judges, and feed results into RL training loops.
Ergon (ἔργον) — from Greek, meaning "characteristic function" or "purposeful work."
In Aristotle's Nicomachean Ethics, every agent has an ergon — the fullest expression of its capabilities. Reinforcement learning is the process by which agents discover and perfect their ergon: the actualization (energeia, ἐνέργεια — literally "the work within") of latent potential (dynamis) through purposeful action.
The word is the root of energy (en + ergon, "the work within"), synergy (syn + ergon, "working together"), and ergodic (exploring the full state space) — spanning single-agent training, multi-agent coordination, and exploration.
Ergon is the successor to Manager Agent Gym (MA Gym), redesigned as a modular, strongly-typed platform with distributed RL training, persistent state, and production orchestration.
UV workspace with four Python packages and a Next.js dashboard:
| Package | Description |
|---|---|
ergon_core/ |
Core library — FastAPI app, persistence, Inngest runtime, RL adapters, sandbox providers |
ergon_builtins/ |
Built-in benchmarks, workers, evaluators, criteria, and rubrics |
ergon_cli/ |
CLI (ergon command) — benchmark, train, run, eval |
ergon_infra/ |
Infrastructure — TRL training runner, SkyPilot provisioning, deployment templates |
ergon-dashboard/ |
Next.js frontend dashboard |
Prerequisites: Python 3.13+, uv, Node.js 20+, pnpm, Docker
# Clone and install
git clone https://github.com/DeepFlow-research/ergon.git
cd ergon
uv sync --all-packages --group dev
# Copy env and fill in your keys
cp .env.example .env
# Start the stack (Postgres, API, Inngest, dashboard)
docker compose up
# Define and run an experiment
ergon experiment define smoke_test --worker training-stub --model stub:constant --limit 1
ergon experiment run <experiment-id>Copy .env.example to .env and set your provider keys:
OPENAI_API_KEY— Required for LLM-based evaluationE2B_API_KEY— Required for sandboxed code executionERGON_DATABASE_URL— Postgres connection string (default provided in docker-compose)
See .env.example for the full list.
# Lint, format, type-check (backend + frontend)
pnpm run check:fast
# Backend only
pnpm run check:be
# Tests
pnpm run test:be:fast # Unit/state tests
pnpm run test:be:e2e # E2E (requires Docker stack)
# Ruff autofix
uv run ruff check --fix ergon_core ergon_builtins ergon_cli ergon_infra tests scripts
uv run ruff format ergon_core ergon_builtins ergon_cli ergon_infra tests scriptsProblem. Runtime DTOs and Inngest events carry both task_id: UUID | None and node_id: UUID | None — two fields that refer to the same thing in practice. Every graph-node IS a task; every task execution IS a node. The two identifiers exist because task_id historically meant "the static ExperimentDefinitionTask FK" while node_id meant "the runtime RunGraphNode PK." Dynamically-spawned subtasks (add_subtask) have no static FK, so task_id is None for them — which leaks through the type system as UUID | None on every downstream DTO and event.
This bit us on 2026-04-23: PreparedTaskExecution.task_id: UUID rejects None from a dynamic subtask, every subtask stays wedged in RUNNING, no error_json gets written (because the except handler has its own bug: references prepared before it's guaranteed bound). Full write-up: docs/bugs/open/2026-04-23-inngest-function-failures.md § A.
Fix. One runtime identity, one field, always present:
task_id: UUID # THE identity — = run_graph_nodes.id
definition_task_id: UUID | None # honestly nullable — FK to static declaration, absent for dynamic tasks
execution_id: UUID # attempt-level (retries get new executions; task_id stays)
Changes:
- DTOs + events (
orchestration_dto.py,task_events.py): dropnode_id, rename currenttask_id: UUID | None→definition_task_id: UUID | None, promote the runtime identity totask_id: UUID(non-null). - Service layer (
task_execution_service.py+ callers): readtask_idwhere currently readingnode_id; readdefinition_task_idonly where the static FK is genuinely needed._emit_task_status'sif node_id is None: returnearly-return disappears — the field can't be null. - Dashboard emitters: the
task_id=node_idalias hack attask_execution_service.py:53goes away. - DB columns (optional second pass):
run_task_executions.node_id→run_task_executions.task_idvia Alembic.run_graph_nodes.idalready serves as the task_id;definition_task_idalready exists on both tables. Non-blocking — the rename is cosmetic/queryability-only.
Separate smaller fix (do first). The except handler in execute_task.py:251-262 references prepared before assignment. Hoist execution_id: UUID | None = None and node_id above the try block so finalize_failure runs even when the prepare step is the thing that raised. This alone doesn't fix the stuck-task bug but it surfaces the real error in error_json + TaskFailedEvent on every future regression.
Risk. Event-shape change — old-shape events in the Inngest queue at cutover will fail. Mitigations: drain the queue before deploy, or accept both field names via AliasChoices + log-warn-on-old-name during the rollover window.
Scope. ~20-30 files. Mostly mechanical after the DTO changes — ty walks the rest.
Apache 2.0 — see LICENSE.