Skip to content

feat: SQLite Worker Coordination + Output Persistence (Phase 2) #89

@dean0x

Description

@dean0x

Epic: #87 — Architectural Simplification v0.6.0

Goal

Enable multiple CLI/MCP processes to coordinate worker spawning through a shared database table, and persist task output for cross-process visibility.

Design Constraints

  • Dual tracking: Keep Map<WorkerId, WorkerState> in-memory for single-process fast path (O(1) lookups). Mirror to workers table for cross-process coordination. Same-process operations stay fast; cross-process queries use DB.
  • Orphan row recovery: If a process crashes mid-kill, the worker is dead but the DB row remains. Recovery on startup handles this — check each owner_pid with process.kill(pid, 0), DELETE dead rows.
  • In-memory/DB boundary: ChildProcess handles, timeout timers, and output streams are ephemeral per-process state — they stay in-memory. The workers table stores only serializable coordination data (id, task_id, pid, owner_pid, agent, started_at).

Sub-tasks

2a. Add workers Table

Schema (migration v9):

CREATE TABLE workers (
  id TEXT PRIMARY KEY,
  task_id TEXT NOT NULL UNIQUE,
  pid INTEGER NOT NULL,
  owner_pid INTEGER NOT NULL,  -- PID of the process that spawned this worker
  agent TEXT NOT NULL DEFAULT 'claude',
  started_at INTEGER NOT NULL,
  FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
);
CREATE INDEX idx_workers_owner ON workers(owner_pid);

Design principle: This is a coordination registry, not full worker state. The table answers one question: "how many workers exist across all processes?"

File: src/implementations/database.ts — Add migration v9.

2b. Worker Lifecycle Writes

On spawn: INSERT into workers table.
On completion/kill: DELETE from workers table.
On startup (recovery): DELETE stale rows where owner_pid is no longer running.

Files:

  • src/implementations/event-driven-worker-pool.ts — Add Database injection. INSERT after successful spawn, DELETE on completion/kill. Keep in-memory Map<WorkerId, WorkerState> for single-process fast path.
  • src/services/recovery-manager.ts — On startup, query workers table, check each owner_pid with process.kill(pid, 0). DELETE rows for dead processes. Mark their tasks as FAILED.

2c. Cross-Process Resource Checks

Change: ResourceMonitor.canSpawnWorker() queries the workers table for global worker count instead of relying on in-memory workerCount.

Files:

  • src/implementations/resource-monitor.ts — Add Database injection. canSpawnWorker() runs SELECT COUNT(*) FROM workers for global count. Keep settlingWorkers array in-memory (it's per-process and still relevant).
  • src/core/interfaces.ts — Update ResourceMonitor interface if method signatures change.

2d. Spawn Serialization Across Processes

Change: Use SQLite's built-in locking for cross-process spawn serialization. The existing in-process mutex (spawnLock in WorkerHandler) handles within-process serialization. For cross-process, use a BEGIN IMMEDIATE transaction around the spawn-decision + INSERT.

File: src/services/handlers/worker-handler.ts — Wrap the spawn decision (resource check + dequeue + spawn + INSERT) in a database.runInTransaction().

2e. Output Persistence for Cross-Process Visibility

Problem: OutputCapture stores task output in a process-local Map<TaskId, OutputBuffer>. beat task logs from a different process returns nothing. An SQLiteOutputRepository exists (src/implementations/output-repository.ts) with a task_output table and file-based fallback for large outputs, but it's never wired into the live capture path.

Change: Wire OutputRepository into ProcessConnector so output is persisted to SQLite during capture. Same-process callers can still use in-memory OutputCapture for speed. Cross-process callers (including the lightweight CLI) read from OutputRepository.

Files:

  • src/services/process-connector.ts — Inject OutputRepository. After capturing to in-memory buffer, also call outputRepository.append(taskId, stream, data). Batch writes to reduce DB contention (e.g., flush every 100ms or on buffer threshold).
  • src/bootstrap.ts — Create SQLiteOutputRepository, pass to ProcessConnector.

Files Changed

Modified

  • src/implementations/database.ts — migration v9
  • src/implementations/event-driven-worker-pool.ts — INSERT/DELETE on spawn/completion
  • src/services/recovery-manager.ts — stale worker cleanup on startup
  • src/implementations/resource-monitor.ts — global worker count from DB
  • src/core/interfaces.ts — ResourceMonitor interface update
  • src/services/handlers/worker-handler.ts — cross-process spawn serialization
  • src/services/process-connector.ts — wire OutputRepository for persistence
  • src/bootstrap.ts — create SQLiteOutputRepository, pass to ProcessConnector

Risk

Medium — new table, cross-process logic. No existing behavior should break since the workers table is additive. Output persistence wires an existing implementation.

Sub-task Risk Notes
2a-d Medium New table, cross-process coordination logic
2e Low Wiring existing OutputRepository implementation

Verification

  • npm run build — clean compilation
  • npx biome check src/ tests/ — no lint issues
  • All test groups pass
  • Two concurrent beat run — workers table shows both, resource checks account for both
  • Kill process mid-task → restart → stale workers cleaned, tasks marked failed
  • beat task logs from a different process — returns output persisted via OutputRepository

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture improvementenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions