Skip to content

Zombie process accumulation when openab runs as PID 1 #436

@agent-worker-use

Description

@agent-worker-use

Description

When openab runs as PID 1 in a container, it does not reap orphaned child processes, leading to zombie process accumulation over time.

  • openab runs as PID 1 (container entrypoint: openab run /etc/openab/config.toml)
  • openab forks kiro-cli (ACP mode), which in turn forks worker processes (e.g., python3, pkill)
  • When these worker processes terminate and their direct parent has already exited, they are reparented to PID 1 (openab)
  • openab does not call wait() or waitpid(), causing these orphaned processes to remain as zombies (Z state)
  • After 2 days of container runtime, 9 zombie processes accumulated, all with PPID=1
  • Zombies consume no CPU or memory, but occupy PID table entries; long-running containers risk PID space exhaustion

Steps to Reproduce

  1. Start openab as container PID 1 (openab run config.toml)
  2. Trigger kiro-cli via ACP to execute shell subprocesses (e.g., subprocess.run() launching python3)
  3. Allow worker processes to complete normally
  4. Observe with ps aux — zombie processes appear with Z (defunct) state and PPID=1

Expected Behavior

Zombie processes should be reaped and removed from the process table. Suggested solutions:

  1. SIGCHLD Handler: Register a SIGCHLD signal handler that calls waitpid(-1, WNOHANG) to reap zombies
  2. Child Subreaper: Use prctl(PR_SET_CHILD_SUBREAPER) combined with reaper logic
  3. Init Wrapper: Document the recommendation to use a dedicated init process (e.g., tini, dumb-init) as the container entrypoint

Environment

  • openab version: 0.7.3
  • Container runtime: Kubernetes pod (k3s)
  • OS: Linux

Screenshots / Logs

$ ps -eo pid,ppid,stat,comm | grep Z
     65       1 Z    pkill
     67       1 Z    python3
     81       1 Z    python3
     87       1 Z    python3
    158       1 Z    python3
    436       1 Z    python3
    462       1 Z    python3
    533       1 Z    python3
    735       1 Z    pkill

This is a known issue with PID 1 processes in containers. The Go runtime does not automatically handle SIGCHLD, so explicit zombie reaping is required when running as PID 1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions