Skip to content

add DockerRunner with container digest pinning #7

@cchinchilla-dev

Description

@cchinchilla-dev

Description

SubprocessRunner (0.1.x #4) is fine for development but insufficient for reproducibility. the framework's replication package must pin the execution environment by container digest. DockerRunner fills that need: users supply a Dockerfile, AgentAnvil pins the built image by sha256:... digest, and every run records the digest it actually used.

Three concrete requirements:

1. User-supplied Dockerfile. AgentAnvil does not generate Dockerfiles (explicit non-goal, the planning notes). The user gives AgentAnvil a path to a Dockerfile or a pre-built image reference.

2. Digest pinning. On first run, AgentAnvil resolves the image to its digest (docker inspect --format='{{.Id}}') and stores it in the RunnerResult. Subsequent runs verify the digest matches; mismatch is fatal.

3. Mount points for recordings and replay. A ./recordings/ directory inside the container is mountable from outside so RecordingBackend can persist.

Proposal

1. DockerRunner class:

# src/agentanvil/runner/docker.py
from pathlib import Path

from agentanvil.runner.base import Runner, RunnerResult


class DockerRunner(Runner):
    name = "docker"

    def __init__(
        self,
        image: str | None = None,
        dockerfile: Path | None = None,
        context: Path | None = None,
        mounts: dict[str, str] | None = None,
        network: str = "none",          # default deny network for security
        cpu_limit: str | None = None,
        memory_limit: str | None = None,
    ):
        if image is None and dockerfile is None:
            raise ValueError("one of image or dockerfile required")
        self.image_ref = image
        self.dockerfile = dockerfile
        self.context = context or (dockerfile.parent if dockerfile else Path.cwd())
        self.mounts = mounts or {}
        self.network = network
        self.cpu_limit = cpu_limit
        self.memory_limit = memory_limit
        self._digest: str | None = None

    async def run(self, *, agent_path, scenario_json, timeout_ms, env=None) -> RunnerResult:
        digest = await self._ensure_built()
        cmd = ["docker", "run", "--rm", "-i",
               "--network", self.network,
               *self._mount_args(),
               *self._env_args(env),
               digest,
               "python", "-u", str(agent_path)]
        # Run under timeout, capture stdout/stderr/exit_code.
        # Return RunnerResult with image_digest=digest.
        ...

    async def _ensure_built(self) -> str:
        if self.image_ref:
            return await self._resolve_digest(self.image_ref)
        # Else build from Dockerfile, then resolve.
        ...

2. Digest verification:

async def verify_digest(self, expected: str) -> None:
    actual = await self._resolve_digest(self.image_ref)
    if actual != expected:
        raise RuntimeError(f"image digest drift: expected {expected}, got {actual}")

3. Integration with RunRecord:

RunnerResult.image_digest flows into RunRecord.metadata.container_digest. Replay uses that digest by default.

Scope

  • src/agentanvil/runner/docker.py — new.
  • src/agentanvil/runner/__init__.py — export DockerRunner.
  • tests/runner/test_docker.py — integration tests (optional nightly CI).
  • tests/fixtures/docker_agents/echo/Dockerfile — toy fixture.
  • docs/runner-docker.md — user guide.

Regression tests

  • test_docker_runner_builds_from_dockerfile_and_resolves_digest
  • test_docker_runner_uses_prebuilt_image_ref
  • test_docker_runner_verify_digest_fails_on_drift
  • test_docker_runner_records_digest_in_result
  • test_docker_runner_honours_timeout
  • test_docker_runner_mounts_recordings_directory
  • test_docker_runner_network_none_by_default

Notes

  • Docker is an optional dependency. agentanvil[docker] installs docker>=7.0.
  • Default network=none — reproducible runs should not depend on outbound network. Users override per run if needed.
  • K8sRunner (0.4.0 #070) reuses digest-resolution logic.
  • Depends on: add static consistency analyzer to core.contracts #4 (Runner ABC).
  • Used by: #021 (quickstart), #022 (determinism CI with pinned image).

Metadata

Metadata

Assignees

No one assigned

    Labels

    dockerDocker / containerisation runnerenhancementNew feature or requestrunnerAgent execution runners (subprocess, docker, k8s)

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions