-
Notifications
You must be signed in to change notification settings - Fork 409
Description
Problem Statement
The e2e test suite (e2e/python/) runs ~36 tests sequentially in a single process. Each test spins up its own sandbox (K8s pod), waits for it to become ready (up to 300s), runs assertions, then tears it down. This makes the full suite slow — total wall time scales linearly with test count. Parallelizing with a configurable n-factor (e.g., 5 at a time) would significantly reduce wall time.
Technical Context
Tests are invoked via uv run pytest -o python_files='test_*.py' e2e/python (mise task test:e2e:sandbox). The sandbox fixture is function-scoped and creates isolated sandboxes per test, making individual tests naturally independent. However, several session-scoped fixtures (gRPC client, mock inference routes) use hard-coded resource names that would race under parallel execution. The standard pytest parallelization tool is pytest-xdist, which distributes tests across worker processes — each worker gets its own session-scoped fixtures.
Affected Components
| Component | Key Files | Role |
|---|---|---|
| E2E test config | e2e/python/conftest.py |
Session-scoped fixtures with hard-coded route names that would race |
| pytest config | pyproject.toml (lines 87-92) |
Test runner configuration, no xdist settings today |
| mise task | build/test.toml (line 24) |
test:e2e:sandbox task — needs -n flag support |
| Python deps | pyproject.toml (lines 29-39) |
Dev dependencies — needs pytest-xdist |
Proposed Approach
Add pytest-xdist as a dev dependency and update the mise task to accept a concurrency factor. Fix session-scoped fixtures in conftest.py that use hard-coded names — either make names unique per xdist worker, or use xdist's session-scope sharing mechanism (file lock) to create shared resources once across all workers. The sandbox fixture itself is already safe — it's function-scoped and creates isolated pods.
Scope Assessment
- Complexity: Low
- Confidence: High — clear path, well-understood tooling
- Estimated files to change: 3 (
conftest.py,pyproject.toml,build/test.toml) - Issue type:
feat
Risks & Open Questions
- Cluster resource limits: Running N sandboxes concurrently requires the k3s node to have enough CPU/memory. May need to document recommended resource sizing or set a sensible default for N.
- Image pull thundering herd: If the sandbox image isn't cached on the node, N concurrent tests will all trigger image pulls simultaneously, potentially causing timeouts.
- Default concurrency value: Should default to sequential (
-n0/ no flag) to avoid breaking existing workflows, or should a default like-n autobe set? - CI impact: The e2e CI workflow (
.github/workflows/e2e.yml) runs in a privileged container — need to verify it has enough resources for parallel sandboxes.
Test Considerations
- This is a change to test infrastructure itself, so validation is running the e2e suite with
-n 5(or similar) and confirming all tests pass without flakes or races - Verify that session-scoped fixtures (mock inference routes) are correctly isolated per worker or shared safely
- Verify no test ordering dependencies exist (tests should already be independent, but parallelism will expose any hidden coupling)
Created by spike investigation. Use build-from-issue to plan and implement.