Skip to content

feat: parallelize e2e tests with configurable concurrency (pytest-xdist) #101

@johntmyers

Description

@johntmyers

Problem Statement

The e2e test suite (e2e/python/) runs ~36 tests sequentially in a single process. Each test spins up its own sandbox (K8s pod), waits for it to become ready (up to 300s), runs assertions, then tears it down. This makes the full suite slow — total wall time scales linearly with test count. Parallelizing with a configurable n-factor (e.g., 5 at a time) would significantly reduce wall time.

Technical Context

Tests are invoked via uv run pytest -o python_files='test_*.py' e2e/python (mise task test:e2e:sandbox). The sandbox fixture is function-scoped and creates isolated sandboxes per test, making individual tests naturally independent. However, several session-scoped fixtures (gRPC client, mock inference routes) use hard-coded resource names that would race under parallel execution. The standard pytest parallelization tool is pytest-xdist, which distributes tests across worker processes — each worker gets its own session-scoped fixtures.

Affected Components

Component Key Files Role
E2E test config e2e/python/conftest.py Session-scoped fixtures with hard-coded route names that would race
pytest config pyproject.toml (lines 87-92) Test runner configuration, no xdist settings today
mise task build/test.toml (line 24) test:e2e:sandbox task — needs -n flag support
Python deps pyproject.toml (lines 29-39) Dev dependencies — needs pytest-xdist

Proposed Approach

Add pytest-xdist as a dev dependency and update the mise task to accept a concurrency factor. Fix session-scoped fixtures in conftest.py that use hard-coded names — either make names unique per xdist worker, or use xdist's session-scope sharing mechanism (file lock) to create shared resources once across all workers. The sandbox fixture itself is already safe — it's function-scoped and creates isolated pods.

Scope Assessment

  • Complexity: Low
  • Confidence: High — clear path, well-understood tooling
  • Estimated files to change: 3 (conftest.py, pyproject.toml, build/test.toml)
  • Issue type: feat

Risks & Open Questions

  • Cluster resource limits: Running N sandboxes concurrently requires the k3s node to have enough CPU/memory. May need to document recommended resource sizing or set a sensible default for N.
  • Image pull thundering herd: If the sandbox image isn't cached on the node, N concurrent tests will all trigger image pulls simultaneously, potentially causing timeouts.
  • Default concurrency value: Should default to sequential (-n0 / no flag) to avoid breaking existing workflows, or should a default like -n auto be set?
  • CI impact: The e2e CI workflow (.github/workflows/e2e.yml) runs in a privileged container — need to verify it has enough resources for parallel sandboxes.

Test Considerations

  • This is a change to test infrastructure itself, so validation is running the e2e suite with -n 5 (or similar) and confirming all tests pass without flakes or races
  • Verify that session-scoped fixtures (mock inference routes) are correctly isolated per worker or shared safely
  • Verify no test ordering dependencies exist (tests should already be independent, but parallelism will expose any hidden coupling)

Created by spike investigation. Use build-from-issue to plan and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:agent-readyApproved for agent implementationstate:pr-openedPR has been opened for this issuetest:e2eRequires end-to-end coverage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions