Release v0.3.2 — BaseUser, verifier hardening, DinD compose · benchflow-ai/benchflow

Highlights

BaseUser progressive-disclosure abstraction (#194): Python callback drives multi-round agent runs. Built for SWE-bench Pro use case (Josh @ GitHub) and as parity answer to Harbor #1316 in the no-second-LLM case. See docs/progressive-disclosure.md.
Per-task [verifier.hardening] opt-outs in task.toml (#194): tasks with legitimate conftest.py setups (qutebrowser-style) opt out of specific cleanup steps. Achieves 5/5 SWE-bench Pro oracle on hardened verifier.
DinD compose ACP via Daytona PTY WebSocket (#193, #196): live agent pipes for SkillsBench / DinD compose tasks.
--rootdir=/app in PYTEST_ADDOPTS (#194): anchors test node IDs to repo root; openlibrary oracle goes 0/18 → 18/18.

cfg.agent_env reaches connect_as() (#191, closes #190): YAML-supplied provider creds now reach the agent.
DinD env-file path mismatch (#198): shlex.join() was quoting $$ literally so written/read paths diverged; switched to uuid.uuid4() for unique paths.
OpenHands sandbox launch + ACP CLI path (#182).
Stop copying root tool installs into sandbox home (#181, closes #178).
sandbox_setup_timeout wired through configs (#180).

Oracle 5/5 on Daytona (ansible, flipt, openlibrary, navidrome, qutebrowser).
Single-round Gemini 3.1 Pro baseline: 2/4.

pip install benchflow==0.3.2