Backend control plane for reproducible computational runs.
Evalynx is a backend-first platform for submitting, executing, tracking, and reproducing runs across external computational systems such as simulations and analytics pipelines. The project focuses on the service layer around computation rather than the computation engines themselves.
Many technical projects already have strong computational cores but weak execution management around them. In practice, that usually means:
- jobs are launched through ad-hoc scripts
- execution metadata is incomplete or inconsistent
- results are scattered across logs, files, and local folders
- comparing runs requires manual reconstruction
- failures are hard to diagnose and retries are poorly tracked
Evalynx addresses that gap with a backend control plane that can:
- accept structured run requests through an API
- persist run metadata in a relational model
- execute jobs asynchronously
- expose lifecycle states such as
queued,running,succeeded, andfailed - retain summaries, metrics, and artifact references
- capture reproducibility metadata such as normalized config and execution provenance
Evalynx is designed as a backend portfolio project, with emphasis on:
- API design and request validation
- relational data modeling and migrations
- background job processing
- execution lifecycle management
- integration with external systems through explicit adapters
- reproducibility-focused backend engineering
- FastAPI
- SQLAlchemy + Alembic
- PostgreSQL
- Redis + RQ
- Docker Compose
- GitHub Actions
The strongest backend signals in this MVP are:
- explicit separation between a logical
Runand concreteRunAttemptrecords - asynchronous execution through Redis + RQ with a dedicated worker process
- subprocess-based runner adapters for bounded integration with external systems
- structured persistence of summaries, metrics, artifacts, warnings, errors, and reproducibility metadata
- retry semantics that preserve prior failure context instead of overwriting it
The initial MVP is intentionally narrow and aims to prove one strong vertical slice:
- FastAPI application
- PostgreSQL persistence
- Redis + RQ background execution
ProjectandRunlifecycle management- one real runner integration
- stored summaries, metrics, artifacts, and failure information
- retry support for failed runs
The first integrated real runner is solo-wargame-ai, chosen because it already has strong reproducibility surfaces and does not depend on private personal data.
The MVP is centered around one main workflow:
- Create a project.
- Submit a run with a runner type and config.
- Persist the run as
queued. - Execute the run asynchronously through a worker.
- Store terminal state, summary data, metrics, and artifact references per execution attempt.
- Inspect the latest run state and attempt history through the API.
- Retry failed runs without erasing prior failure context.
There are two intended local paths:
- Reviewer quickstart: Docker Compose plus the built-in
stubrunner. This is the fastest way to validate the asynchronous lifecycle with no external checkout. - Host development with the real runner: a local Python environment plus the same Dockerized PostgreSQL and Redis services, with a local checkout of
solo-wargame-ai.
If you want the fastest path to understanding the project, start with the reviewer quickstart below. It proves the queue, worker, persistence, retry, and lifecycle story without needing any external repository.
Requirements:
- Docker Desktop or Docker Engine with Compose support
Setup:
- Copy the example environment if you want a local
.env:cp .env.example .env - Start infrastructure:
docker compose up -d postgres redis - Apply migrations:
docker compose run --rm migrate - Start the API and worker:
docker compose up -d api worker - Confirm the API is healthy:
curl http://localhost:8000/health
If you want a clean demo state, run docker compose down -v before repeating the example requests below. The sample IDs assume a fresh database; otherwise, use the IDs returned by your own API responses.
Create a project:
curl -X POST http://localhost:8000/projects \
-H 'Content-Type: application/json' \
-d '{"name":"Reviewer Demo"}'Submit a successful async stub run:
curl -X POST http://localhost:8000/runs \
-H 'Content-Type: application/json' \
-d '{
"project_id": 1,
"runner_type": "stub",
"config": {
"scenario": "compose-demo-success"
}
}'Submit a failing run that can be retried:
curl -X POST http://localhost:8000/runs \
-H 'Content-Type: application/json' \
-d '{
"project_id": 1,
"runner_type": "stub",
"config": {
"should_fail": true,
"failure_message": "Reviewer demo failure"
}
}'Inspect the lifecycle and retry the failed run:
curl http://localhost:8000/runs/1
curl http://localhost:8000/runs/2
curl -X POST http://localhost:8000/runs/2/retry
curl http://localhost:8000/runs/2
docker compose logs -f workerExpected reviewer outcomes:
- the health endpoint returns
status: "ok" - the first stub run reaches
succeededand persists a summary - the second stub run reaches
failedwith a stored failure message - retrying the failed run increments
attempt_countandcurrent_attempt_number
Artifacts written by the stack are stored under ./artifacts.
Shut the stack down with:
docker compose downIf you also want to remove the PostgreSQL volume:
docker compose down -vRequirements:
- Python 3.11+
- Docker Desktop or Docker Engine with Compose support
- a local checkout of
solo-wargame-aiwhen you want the real runner path
Setup:
- Copy the environment template:
cp .env.example .env - Start PostgreSQL and Redis:
docker compose up -d postgres redis - Create a virtual environment:
python -m venv .venv - Activate it:
source .venv/bin/activate - Install dependencies:
pip install -e '.[dev]' - Export or edit the required env vars:
EVALYNX_DATABASE_URL=postgresql+psycopg://evalynx:evalynx@localhost:5432/evalynxEVALYNX_REDIS_URL=redis://localhost:6379/0EVALYNX_SOLO_WARGAME_REPO_PATH=/absolute/path/to/solo-wargame-aiEVALYNX_SOLO_WARGAME_PYTHON_COMMAND=/absolute/path/to/solo-wargame-ai/.venv/bin/pythonEVALYNX_ARTIFACT_ROOT=./artifacts
- Apply migrations:
alembic upgrade head - Start the API:
uvicorn app.main:app --reload - Start the worker in a second shell:
python -m app.workers.entrypoint - Run tests:
python -m pytest
For the first real runner family, POST /runs accepts runner_type: "solo_wargame" with a logical config shaped like:
{
"project_id": 1,
"runner_type": "solo_wargame",
"config": {
"mission_path": "configs/missions/mission_01_secure_the_woods_1.toml",
"policy": {
"kind": "builtin",
"name": "heuristic"
},
"seed_spec": {
"kind": "range",
"start": 0,
"stop": 4
},
"write_episode_rows": true
}
}Evalynx materializes the upstream request file, allocates the artifact directory, invokes the external CLI through a subprocess adapter, and persists summary, metrics, execution metadata, artifacts, warnings, and structured error details back onto the run record and its current attempt snapshot.
Failed runs can be retried through POST /runs/{id}/retry. Run detail responses keep the latest run snapshot easy to inspect while also exposing bounded attempt history, and solo_wargame artifacts are now written into per-attempt directories under each run.
Evalynx is at a finished MVP milestone.
A reviewer can now:
- run the Docker Compose stack locally
- create a project and submit runs through the API
- observe asynchronous execution through Redis + RQ worker processing
- inspect persisted summaries, metrics, artifacts, and attempt history
- retry failed runs without overwriting prior failure context
- exercise the real
solo_wargameintegration from a host-based app and worker when the external repository is available
The public docs, Compose workflow, and CI checks are aligned around that reviewer-facing backend story.