-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description
The Local Process Sandbox crashes with exit code 137 (OOM killed) approximately 2 minutes into a factory mode build when generating a REST API project. The sandbox process is being terminated by the OS out-of-memory killer before the build pipeline completes, preventing the DevOps Agent from ever reaching the deployment stage.
Reproduced consistently using the GPT-4o model via the Telegram bot interface with factory mode enabled.
Steps to Reproduce
- Connect to ZeroBuild via the Telegram bot
- Trigger factory mode by submitting a REST API idea (e.g.,
"Build a REST API for a task manager with CRUD endpoints") - Observe the Orchestrator spawning the full agent pipeline (BA → UI/UX → Developer → Tester → DevOps)
- Wait approximately 2 minutes into the build
- Observe the sandbox process terminate with exit code 137
Expected Behavior
The Local Process Sandbox should sustain the full factory mode pipeline through all agent stages (code generation, test execution, build verification) without exhausting system memory. If memory pressure is detected, the sandbox should either stream results incrementally, limit concurrent sub-processes, or surface a recoverable error rather than crashing.
Current Behavior
The sandbox process is OOM-killed (SIGKILL / exit code 137) by the OS kernel approximately 2 minutes into execution. The build pipeline is aborted with no recovery, and the Telegram bot receives no structured error — the session silently fails or hangs.
Technical Notes
- Exit code 137: This is
128 + SIGKILL(9), universally indicating the Linux OOM killer terminated the process. Checkdmesgor/var/log/syslogforOut of memory: Kill processentries to confirm. - Suspected root cause: Factory mode spawns up to 6 concurrent specialized agents (Orchestrator, BA, UI/UX, Developer, Tester, DevOps), each likely holding large LLM context windows in memory simultaneously. With GPT-4o's large context, the aggregate in-memory token buffers across agents on the
InMemoryMessageBus/SharedContextEntryBlackboard may be driving unbounded heap growth. - Blackboard pressure:
SharedContextEntryaccumulates cross-agent state throughout the pipeline. If intermediate artifacts (PRD, design specs, generated source files, test output) are all retained in memory without eviction, the Blackboard alone could account for significant RSS growth. - Sandbox isolation: The Local Process Sandbox runs with cleared environment variables — verify whether memory limits (cgroups/ulimit) are also being cleared or are absent from the sandbox configuration, leaving the child process unconstrained.
- Potential mitigations to investigate:
- Apply cgroup v2 memory limits to the sandbox child process
- Implement lazy/streaming artifact passing between agents instead of full in-memory retention
- Evict completed agent context from
SharedContextEntryonce downstream agents have consumed it - Add a memory budget check in the Orchestrator before spawning each sub-agent
- Affected components:
LocalProcessSandbox,InMemoryMessageBus,SharedContextEntry, Orchestrator agent spawn logic - Observed channel: Telegram bot
- LLM provider: OpenRouter / GPT-4o
Labels: bug