feat: parallelize e2e tests with configurable concurrency (pytest-xdist)

## Problem Statement

The e2e test suite (`e2e/python/`) runs ~36 tests sequentially in a single process. Each test spins up its own sandbox (K8s pod), waits for it to become ready (up to 300s), runs assertions, then tears it down. This makes the full suite slow — total wall time scales linearly with test count. Parallelizing with a configurable n-factor (e.g., 5 at a time) would significantly reduce wall time.

## Technical Context

Tests are invoked via `uv run pytest -o python_files='test_*.py' e2e/python` (mise task `test:e2e:sandbox`). The `sandbox` fixture is function-scoped and creates isolated sandboxes per test, making individual tests naturally independent. However, several session-scoped fixtures (gRPC client, mock inference routes) use hard-coded resource names that would race under parallel execution. The standard pytest parallelization tool is `pytest-xdist`, which distributes tests across worker processes — each worker gets its own session-scoped fixtures.

## Affected Components

| Component | Key Files | Role |
|-----------|-----------|------|
| E2E test config | `e2e/python/conftest.py` | Session-scoped fixtures with hard-coded route names that would race |
| pytest config | `pyproject.toml` (lines 87-92) | Test runner configuration, no xdist settings today |
| mise task | `build/test.toml` (line 24) | `test:e2e:sandbox` task — needs `-n` flag support |
| Python deps | `pyproject.toml` (lines 29-39) | Dev dependencies — needs `pytest-xdist` |

## Proposed Approach

Add `pytest-xdist` as a dev dependency and update the mise task to accept a concurrency factor. Fix session-scoped fixtures in `conftest.py` that use hard-coded names — either make names unique per xdist worker, or use xdist's session-scope sharing mechanism (file lock) to create shared resources once across all workers. The `sandbox` fixture itself is already safe — it's function-scoped and creates isolated pods.

## Scope Assessment

- **Complexity:** Low
- **Confidence:** High — clear path, well-understood tooling
- **Estimated files to change:** 3 (`conftest.py`, `pyproject.toml`, `build/test.toml`)
- **Issue type:** `feat`

## Risks & Open Questions

- **Cluster resource limits:** Running N sandboxes concurrently requires the k3s node to have enough CPU/memory. May need to document recommended resource sizing or set a sensible default for N.
- **Image pull thundering herd:** If the sandbox image isn't cached on the node, N concurrent tests will all trigger image pulls simultaneously, potentially causing timeouts.
- **Default concurrency value:** Should default to sequential (`-n0` / no flag) to avoid breaking existing workflows, or should a default like `-n auto` be set?
- **CI impact:** The e2e CI workflow (`.github/workflows/e2e.yml`) runs in a privileged container — need to verify it has enough resources for parallel sandboxes.

## Test Considerations

- This is a change to test infrastructure itself, so validation is running the e2e suite with `-n 5` (or similar) and confirming all tests pass without flakes or races
- Verify that session-scoped fixtures (mock inference routes) are correctly isolated per worker or shared safely
- Verify no test ordering dependencies exist (tests should already be independent, but parallelism will expose any hidden coupling)

---
*Created by spike investigation. Use `build-from-issue` to plan and implement.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: parallelize e2e tests with configurable concurrency (pytest-xdist) #101

Problem Statement

Technical Context

Affected Components

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Key Files	Role
E2E test config	`e2e/python/conftest.py`	Session-scoped fixtures with hard-coded route names that would race
pytest config	`pyproject.toml` (lines 87-92)	Test runner configuration, no xdist settings today
mise task	`build/test.toml` (line 24)	`test:e2e:sandbox` task — needs `-n` flag support
Python deps	`pyproject.toml` (lines 29-39)	Dev dependencies — needs `pytest-xdist`

feat: parallelize e2e tests with configurable concurrency (pytest-xdist) #101

Description

Problem Statement

Technical Context

Affected Components

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions