Skip to content

BenchFlow pipe-closed on Daytona DinD compose tasks — agent install/exec fails silently #186

@xdotli

Description

@xdotli

Summary

When running tasks that have docker-compose.yaml on Daytona, BenchFlow fails with Process closed stdout (rc=None) while Harbor succeeds with the same environment. The agent is never installed and never starts.

Reproduction

Same task (gh-repo-analytics), same model (claude-opus-4-7), same Daytona:

Method Result Tool Calls Error
BenchFlow + Docker (host) ✅ reward=0.0 8 None
Harbor + Daytona DinD ⚠️ reward=0.0 N/A NonZeroAgentExitCodeError (auth issue, but compose worked)
BenchFlow + Daytona DinD 0 Process closed stdout (rc=None)

Harbor proves the DinD compose infrastructure works on Daytona. BenchFlow's ACP path fails at the agent install/exec step.

Root cause analysis

BenchFlow's DaytonaProcess.from_harbor_env() detects DinD mode and constructs docker compose exec commands to run inside the DinD container. However:

  1. install_agent() calls env.exec() which goes through Harbor's DaytonaEnvironment._sandbox_exec()_strategy.exec()_DaytonaDinD.exec()_compose_exec(["exec", ...])
  2. The compose exec command runs inside the DinD VM via SSH
  3. The agent install produces empty stdout — the install command output is lost
  4. After install, connect_acp() creates a DaytonaProcess and tries to start the agent via docker compose exec ... claude-agent-acp
  5. The agent process immediately closes stdout → Process closed stdout (rc=None)

Likely causes:

  • The compose env vars (_compose_env_vars()) may not include the subscription auth credentials needed by the agent
  • The SSH → DinD → compose exec chain may lose the --env-file or env var injection
  • Node.js may not be installed inside the compose container (the DinD VM is docker:28.3.3-dind Alpine, but the compose main container is built from the task's Dockerfile)

Evidence

BenchFlow DinD result:

  • install-stdout.txt: empty
  • claude_agent_acp.txt: empty
  • timing.json: environment_setup: 42.7s (compose up succeeded)
  • config.json: _BENCHFLOW_SUBSCRIPTION_AUTH: 1 (OAuth detected)

Environment

  • BenchFlow v0.3.1 (dev-0.3 branch)
  • Harbor (pip installed, used by BenchFlow)
  • Daytona cloud
  • Tasks affected: all 5 with docker-compose.yaml in SkillsBench (gh-repo-analytics, pedestrian-traffic-counting, pg-essay-to-audiobook, scheduling-email-assistant, react-performance-debugging)

Expected behavior

BenchFlow should install and run claude-agent-acp inside the DinD compose main container, same as Harbor installs and runs claude CLI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions