Skip to content

AlessioZazzarini/AgentSquad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentSquad

Autonomous AI development toolkit for Claude Code. AgentSquad gives Claude Code a safe, completion-enforcing runtime for autonomous task execution -- structured task repositories, worker spawning, compliance enforcement, multi-agent teams, and cross-model collaboration. Battle-tested across 24 lessons from production use.

Quick Start (One Command)

cd your-project
npx agentsquad start

This auto-detects your project type, installs AgentSquad, syncs GitHub issues, and starts the Conductor. Works with Node.js, Python, Go, and Rust projects.

Manual Install

cd your-project
npx agentsquad init

What You Get

  • Task repository (.tasks/) — structured file-based tracking with status.json, acceptance criteria, execution logs
  • Worker spawning — tmux-based autonomous Claude Code workers with iteration budgets
  • Status updates — safe jq-based JSON mutations with path traversal validation
  • Worker monitoring — JSON output of all active workers and their health
  • Notifications — webhook-based alerts on status transitions (Slack, Discord, generic)
  • Compliance enforcement — 7-point anti-premature-stopping checklist every iteration
  • Cross-model collaboration — delegate to a secondary model (Codex, etc.) for think/build/debug

How It Works

AgentSquad uses a loop methodology where Claude Code runs autonomously inside a structured framework:

  1. Task files define what to do (acceptance criteria, environment, interfaces)
  2. Workers are spawned in tmux windows, each with a dynamically built prompt
  3. Status tracking happens via scripts (never direct JSON edits) for safety
  4. Compliance hooks prevent premature stopping, context rot, and scope creep
  5. Notifications keep you informed via webhooks

Task Types

Type Purpose
Plan Architecture, design, research — produces a plan document
Implement Build features, fix bugs — produces code + tests + PR
Debug Investigate and fix issues — hypothesis-driven with execution log

The 3-File System

Every task in .tasks/<task-id>/ has:

File Purpose
status.json Machine-readable state (status, complexity, attempts, timestamps)
acceptance-criteria.md What "done" looks like — the worker's contract
execution-log.md Real-time progress log — you monitor this

Optional files: environment.md (task-specific env setup), screenshots/, .worker-prompt.md (auto-generated).

Compliance Enforcement

The loop framework injects a 7-point anti-premature-stopping checklist every iteration:

  1. Are all acceptance criteria met?
  2. Do build and tests pass?
  3. Is the execution log up to date?
  4. Has status been updated via the script?
  5. Are there no unresolved blockers?
  6. Has the completion promise been fulfilled?
  7. Is there any remaining work you haven't attempted?

Workers cannot stop until all 7 points are satisfied. Push and PR creation happen after worker completion (handled by the orchestrator).

Orchestration (The Conductor)

The Conductor is AgentSquad's single orchestration engine. One idempotent tick handles the full lifecycle:

# Single tick (default)
bash scripts/agentsquad/conductor.sh --once

# Continuous mode (tick every 3 minutes)
bash scripts/agentsquad/conductor.sh --loop 3m

# Dry-run (preview, no changes)
bash scripts/agentsquad/conductor.sh --dry-run

# From Claude Code
/conductor                  # single tick
/loop 5m /conductor         # continuous

Each tick: finalizes completed workers (push + PR), checks review artifacts, applies approval policy, merges approved tasks, health checks (warn/kill stuck workers), spawns new workers until at capacity (MAX_WORKERS). See docs/conductor.md.

To triage GitHub issues into the task queue, use /orchestrate — it fetches issues, parses dependencies, creates task directories, then hands off to the Conductor.

Approval Modes

Mode Who approves When to use
manual (default) Human reviews PR, then runs update-status.sh <id> status approved Production repos, security-sensitive work
auto Conductor auto-approves after CI green + policy check Trusted repos, low-risk tasks
paused Nobody — all merges halted Emergencies, code freezes

Configure in .claude/agentsquad.json:

{ "approval": { "default": "manual" } }

Per-task override via approval_mode field in status.json. Sensitive paths (configured in approval.auto_merge.sensitive_paths) force manual review regardless of mode.

Multi-Agent Teams

For complex tasks spanning multiple domains, spawn specialist agents:

| Files touched       | Agent suggestion          |
|---------------------|---------------------------|
| src/lib/ai/**       | voice/AI specialist       |
| src/components/**   | UI/frontend specialist    |
| src/lib/api/**      | systems/backend specialist|
| **/*.test.*         | QA/testing specialist     |

Define agents in .claude/agents/ with paired skills in .claude/skills/.

Optional Packs

Install additional capabilities:

agentsquad add collab          # Cross-model collaboration (think/build/debug)
agentsquad add github          # GitHub issue orchestration with label management
agentsquad add vercel          # Vercel preview deployment + E2E testing
agentsquad add notifications   # Slack and Telegram notification scripts
agentsquad add supabase        # Supabase branch database management

Complexity Budgets

Workers get iteration budgets based on task complexity:

Complexity Max Iterations
simple 15
medium 20
high 30

Override per-task: spawn-worker.sh <task-id> 40

Environment Variables

Variable Default Purpose
AGENTSQUAD_TASKS_DIR .tasks Task repository directory
AGENTSQUAD_TMUX_SESSION project dirname tmux session name
AGENTSQUAD_NOTIFY_WEBHOOK (none) Webhook URL for notifications
AGENTSQUAD_MAX_WORKERS 3 Max concurrent workers
AGENTSQUAD_SECONDARY_MODEL gpt-5.4 Model for collab pack
AGENTSQUAD_SECONDARY_CLI codex CLI command for collab pack

Battle-Tested

AgentSquad encodes 24 hard-won lessons from production autonomous development. See docs/learnings.md for the full list, including:

  • Three-layer session isolation (env var, state file, session ID)
  • export $() is the most dangerous shell pattern
  • Must use Opus model for workers (Sonnet runs out of context)
  • Sleep 8 seconds after tmux window creation
  • Fresh Claude sessions per issue (prevents context rot)
  • Topological sort with cycle detection for dependencies

Documentation

License

MIT -- Alessio Zazzarini

About

Autonomous AI development toolkit for Claude Code — safe, completion-enforcing dev runtime with multi-agent teams, parallel orchestration, and cross-model collaboration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors