diff --git a/README.md b/README.md index 4918a3a..c7995a4 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,14 @@ # ShellForge -**Governed AI agent runtime — one Go binary, local or cloud.** +**Governed AI coding CLI and agent runtime — one Go binary, local or cloud.** [![Go](https://img.shields.io/badge/Go-1.18+-00ADD8?style=for-the-badge&logo=go&logoColor=white)](https://go.dev) [![GitHub Pages](https://img.shields.io/badge/Live_Site-agentguardhq.github.io/shellforge-ff6b2b?style=for-the-badge)](https://agentguardhq.github.io/shellforge) [![License: MIT](https://img.shields.io/badge/License-MIT-blue?style=for-the-badge)](LICENSE) [![AgentGuard](https://img.shields.io/badge/Governed_by-AgentGuard-green?style=for-the-badge)](https://github.com/AgentGuardHQ/agentguard) -*Run autonomous AI agents with policy enforcement on every tool call. Local via Ollama or cloud via Anthropic API — your choice.* +*Interactive pair-programming with local models + autonomous multi-task execution — with governance on every tool call.* [Website](https://agentguardhq.github.io/shellforge) · [Docs](docs/architecture.md) · [Roadmap](docs/roadmap.md) · [AgentGuard](https://github.com/AgentGuardHQ/agentguard) @@ -54,12 +54,17 @@ shellforge setup # creates agentguard.yaml + output dirs This creates `agentguard.yaml` (governance policy) in your project root. Edit it to customize which actions are allowed/denied. -### 5. Run an agent +### 5. Start a chat session + +```bash +shellforge chat # interactive REPL — pair-program with a local model +``` + +Or run a one-shot agent: ```bash shellforge agent "describe what this project does" shellforge agent "find test gaps and suggest improvements" -shellforge agent "create a hello world program" ``` Every tool call (file reads, writes, shell commands) passes through governance before execution. @@ -70,17 +75,55 @@ Every tool call (file reads, writes, shell commands) passes through governance b ## What Is ShellForge? -ShellForge is a **governed agent runtime** — not an agent framework, not an orchestration layer, not a prompt wrapper. +ShellForge is a **governed AI coding CLI and agent runtime** — like Claude Code or Cursor, but with local models and policy enforcement built in. -It sits between any agent driver and the real world. The agent decides what it wants to do. ShellForge decides whether it's allowed. +Two modes: + +1. **Interactive REPL** (`shellforge chat`) — pair-program with a local or cloud model. Persistent conversation history, shell escapes, color output. +2. **Autonomous agents** (`shellforge agent`, `shellforge ralph`) — one-shot tasks or multi-task loops with automatic validation and commit. + +Both modes share the same governance layer. Every tool call passes through [AgentGuard](https://github.com/AgentGuardHQ/agentguard) policy enforcement before execution. ``` -Agent Driver (Goose, Claude Code, Copilot CLI) - → ShellForge Governance (allow / deny / correct) - → Your Environment (files, shell, git) +You (chat) or Octi Pulpo (dispatch) + → ShellForge Agent Loop (tool calling, drift detection) + → AgentGuard Governance (allow / deny / correct) + → Your Environment (files, shell, git) ``` -**The core insight:** ShellForge's value is governance, not the agent loop. [Goose](https://block.github.io/goose) handles local agent execution. [Dagu](https://github.com/dagu-org/dagu) handles workflow orchestration. ShellForge wraps them all with [AgentGuard](https://github.com/AgentGuardHQ/agentguard) policy enforcement on every tool call. +--- + +## Interactive REPL (`shellforge chat`) + +Pair-programming mode. Persistent conversation history across prompts — the model remembers what you discussed. + +```bash +shellforge chat # local model via Ollama (default) +shellforge chat --provider anthropic # Anthropic API (Haiku/Sonnet/Opus) +shellforge chat --model qwen3:14b # pick a specific model +``` + +Features: +- **Color output** — green prompt, red errors, yellow governance denials +- **Shell escapes** — `!git status` runs a command without leaving the session +- **Ctrl+C** — interrupts the current agent run without killing the session +- **Governance** — every tool call checked against `agentguard.yaml`, same as autonomous mode + +--- + +## Ralph Loop (`shellforge ralph`) + +Stateless-iterative multi-task execution. Each task gets a fresh context window — no accumulated confusion across tasks. + +```bash +shellforge ralph tasks.json # run tasks from a JSON file +shellforge ralph --validate "go test ./..." # validate after each task +shellforge ralph --dry-run # preview without executing +``` + +The loop: **PICK** a task → **IMPLEMENT** it → **VALIDATE** (run tests) → **COMMIT** on success → **RESET** context → next task. + +Tasks come from a JSON file or Octi Pulpo MCP dispatch. Failed validations skip the commit and move on — no broken code lands. --- @@ -112,8 +155,14 @@ shellforge status | Command | Description | |---------|-------------| -| `shellforge agent "prompt"` | Run a governed agent (Ollama, default) | -| `shellforge agent --provider anthropic "prompt"` | Run via Anthropic API (Haiku/Sonnet/Opus, prompt caching) | +| `shellforge chat` | Interactive REPL — pair-program with a local or cloud model | +| `shellforge chat --provider anthropic` | REPL via Anthropic API (Haiku/Sonnet/Opus) | +| `shellforge chat --model qwen3:14b` | REPL with a specific Ollama model | +| `shellforge ralph tasks.json` | Multi-task loop — stateless-iterative execution | +| `shellforge ralph --validate "go test ./..."` | Ralph Loop with post-task validation | +| `shellforge ralph --dry-run` | Preview tasks without executing | +| `shellforge agent "prompt"` | One-shot governed agent (Ollama, default) | +| `shellforge agent --provider anthropic "prompt"` | One-shot via Anthropic API (prompt caching) | | `shellforge agent --thinking-budget 8000 "prompt"` | Enable extended thinking (Sonnet/Opus) | | `shellforge run "prompt"` | Run a governed CLI driver (goose, claude, copilot, codex, gemini) | | `shellforge setup` | Install Ollama, create governance config, verify stack | @@ -125,6 +174,23 @@ shellforge status --- +## Built-in Tools + +The agent loop (used by `chat`, `agent`, and `ralph`) has 8 built-in tools, all governed: + +| Tool | What It Does | +|------|-------------| +| `read_file` | Read file contents | +| `write_file` | Write a complete file | +| `edit_file` | Targeted find-and-replace (like Claude Code's Edit tool) | +| `glob` | Pattern-based file discovery with recursive `**` support | +| `grep` | Regex content search with `file:line` output | +| `run_shell` | Execute shell commands (via RTK for token compression) | +| `list_directory` | List directory contents | +| `search_files` | Search files by name pattern | + +--- + ## Multi-Driver Governance ShellForge governs any CLI agent driver via AgentGuard hooks. Each driver keeps its own model and agent loop — ShellForge ensures governance is active and spawns the driver as a subprocess. @@ -151,6 +217,12 @@ See `dags/multi-driver-swarm.yaml` and `dags/workspace-swarm.yaml` for examples. ``` ┌───────────────────────────────────────────────────┐ +│ Entry Points │ +│ chat (REPL) · agent (one-shot) · ralph (multi) │ +│ run · serve (daemon) │ +└────────────────────┬──────────────────────────────┘ + │ prompt / task +┌────────────────────▼──────────────────────────────┐ │ Octi Pulpo (Coordination) │ │ Budget-aware dispatch · Memory · Model cascading │ └────────────────────┬──────────────────────────────┘ @@ -158,6 +230,7 @@ See `dags/multi-driver-swarm.yaml` and `dags/workspace-swarm.yaml` for examples. ┌────────────────────▼──────────────────────────────┐ │ ShellForge Agent Loop │ │ LLM provider · Tool calling · Drift detection │ +│ Sub-agent orchestrator (spawn sync/async) │ │ Anthropic API or Ollama │ └────────────────────┬──────────────────────────────┘ │ tool call @@ -171,6 +244,7 @@ See `dags/multi-driver-swarm.yaml` and `dags/workspace-swarm.yaml` for examples. ┌────────────────────▼──────────────────────────────┐ │ Your Environment │ │ Files · Shell (RTK) · Git · Network │ +│ 8 tools: read/write/edit/glob/grep/shell/ls/find │ │ Sandboxed by OpenShell │ └───────────────────────────────────────────────────┘ ``` diff --git a/docs/architecture.md b/docs/architecture.md index 751f3c7..0d41e80 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -4,6 +4,41 @@ ShellForge is a single Go binary (~7.5MB) that provides governed AI agent execution. Its core value is **governance** — every agent driver, whether a CLI tool, browser session, or local model, runs through AgentGuard policy enforcement on every action. +## Entry Points + +ShellForge provides multiple entry points, all sharing the same agent loop and governance layer: + +| Entry Point | Mode | Context | +|-------------|------|---------| +| `shellforge chat` | Interactive REPL | Persistent — conversation history across prompts | +| `shellforge agent "prompt"` | One-shot | Single task, single context window | +| `shellforge ralph tasks.json` | Multi-task loop | Stateless-iterative — fresh context per task | +| `shellforge run ` | CLI driver | Governed subprocess (Goose, Claude Code, etc.) | +| `shellforge serve agents.yaml` | Daemon | 24/7 swarm with memory-aware scheduling | + +### Interactive REPL (`chat`) + +Pair-programming mode. The user and model share a persistent conversation — the model remembers previous prompts and results within the session. Color output (green prompt, red errors, yellow governance denials). Shell escapes via `!command`. Ctrl+C interrupts the current agent run without killing the session. + +### Ralph Loop (`ralph`) + +Stateless-iterative execution for multi-task workloads. Each task gets a fresh context window to prevent accumulated confusion: + +``` +PICK task from queue → IMPLEMENT → VALIDATE (run tests) → COMMIT on success → RESET context → next +``` + +Tasks come from a JSON file or Octi Pulpo MCP dispatch. `--validate` runs a command (e.g., `go test ./...`) after each task. `--dry-run` previews without executing. + +### Sub-Agent Orchestrator + +The agent loop can spawn sub-agents for parallel work: + +- **SpawnSync** — block and wait for a sub-agent to complete +- **SpawnAsync** — fire multiple sub-agents, collect results +- Concurrency controlled via semaphore +- Sub-agent results compressed to ~750 tokens before returning to parent + ## Execution Model ShellForge supports three classes of agent driver, all governed uniformly: @@ -110,7 +145,6 @@ Octi Pulpo routes tasks to the cheapest capable driver: | **Optimize** | [RTK](https://github.com/rtk-ai/rtk) | Token compression — 70-90% reduction on shell output | | **Execute** | [Goose](https://block.github.io/goose) / [OpenClaw](https://github.com/openclaw/openclaw) | Agent execution + browser automation | | **Coordinate** | [Octi Pulpo](https://github.com/AgentGuardHQ/octi-pulpo) | Budget-aware dispatch, episodic memory, model cascading | -| **Coordinate** | [Octi Pulpo](https://github.com/AgentGuardHQ/octi-pulpo) | Swarm coordination via MCP | | **Govern** | [AgentGuard](https://github.com/AgentGuardHQ/agentguard) | Policy enforcement on every action | | **Sandbox** | [OpenShell](https://github.com/NVIDIA/OpenShell) | Kernel-level isolation (Docker on macOS) | | **Scan** | [DefenseClaw](https://github.com/cisco-ai-defense/defenseclaw) | Supply chain scanner — AI Bill of Materials | @@ -120,6 +154,8 @@ Octi Pulpo routes tasks to the cheapest capable driver: ``` cmd/shellforge/ ├── main.go # CLI entry point (cobra-style subcommands) +├── chat.go # Interactive REPL (`shellforge chat`) +├── ralph.go # Multi-task loop (`shellforge ralph`) └── status.go # Ecosystem health check internal/ @@ -128,10 +164,13 @@ internal/ │ └── anthropic.go# Anthropic API adapter (stdlib HTTP, prompt caching, tool_use) ├── agent/ # Agentic loop │ ├── loop.go # runProviderLoop (Anthropic) + runOllamaLoop, drift detection wiring -│ └── drift.go # Drift detector — self-score every 5 calls, steer/kill on low scores +│ ├── drift.go # Drift detector — self-score every 5 calls, steer/kill on low scores +│ └── repl.go # Interactive REPL — persistent history, color output, shell escapes +├── ralph/ # Ralph Loop — stateless-iterative multi-task execution +│ └── loop.go # PICK → IMPLEMENT → VALIDATE → COMMIT → RESET cycle ├── governance/ # agentguard.yaml parser + policy engine ├── ollama/ # Ollama HTTP client (chat, generate) -├── tools/ # 5 tool implementations + RTK wrapper +├── tools/ # 8 tool implementations (read/write/edit/glob/grep/shell/ls/find) + RTK wrapper ├── engine/ # Pluggable engine interface (Goose, OpenClaw, OpenCode) ├── logger/ # Structured JSON logging ├── scheduler/ # Memory-aware scheduling + cron @@ -146,17 +185,19 @@ internal/ ShellForge uses a pluggable engine system: -1. **Goose** (preferred local driver) — subprocess, native Ollama support, SHELL wrapped via `govern-shell.sh` -2. **OpenClaw** (browser + integrations) — browser automation, web app access, 100+ skills -3. **NemoClaw** (enterprise) — OpenClaw + NVIDIA OpenShell sandbox + Nemotron local models -4. **CLI Drivers** (cloud coding) — Claude Code, Codex, Copilot CLI, Gemini CLI -5. **Native** (fallback) — built-in multi-turn loop with Ollama + tool calling +1. **Native REPL** (`shellforge chat`) — interactive pair-programming, persistent history, 8 built-in tools +2. **Native Agent** (`shellforge agent`) — one-shot autonomous execution with the same tool set +3. **Ralph Loop** (`shellforge ralph`) — stateless-iterative multi-task with validation and auto-commit +4. **Goose** (local driver) — subprocess, native Ollama support, SHELL wrapped via `govern-shell.sh` +5. **OpenClaw** (browser + integrations) — browser automation, web app access, 100+ skills +6. **NemoClaw** (enterprise) — OpenClaw + NVIDIA OpenShell sandbox + Nemotron local models +7. **CLI Drivers** (cloud coding) — Claude Code, Codex, Copilot CLI, Gemini CLI ## Governance Flow ``` -User Request → Engine (Goose/OpenClaw/CLI/Native) - → Tool Call → Governance Check (agentguard.yaml) +User Request → Entry Point (chat/agent/ralph/run/serve) + → Agent Loop → Tool Call → Governance Check (agentguard.yaml) → ALLOW → Execute Tool → Return Result → DENY → Log Violation → Correction Feedback → Retry ``` diff --git a/docs/roadmap.md b/docs/roadmap.md index 1bfe4d6..517046f 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -40,7 +40,7 @@ - [x] Fixed catch-all deny bug (bounded-execution policy was denying everything) - [x] Dagu DAG templates (sdlc-swarm, studio-swarm, workspace-swarm, multi-driver) -### v0.7.0 — Anthropic API Provider ← CURRENT +### v0.7.0 — Anthropic API Provider - [x] LLM provider interface (`llm.Provider`) — pluggable Ollama vs Anthropic backends - [x] Anthropic API adapter — stdlib HTTP, structured `tool_use` blocks, multi-turn history - [x] Prompt caching — `cache_control: ephemeral` on system + tools, ~90% savings on cached tokens @@ -49,6 +49,21 @@ - [x] Drift detection — self-score every 5 tool calls, steer below 7, kill below 5 twice - [x] RTK token compression wired into `runShellWithRTK()` (70-90% savings on shell output) +### v0.8.0 — UMAAL (Interactive REPL + Ralph Loop + Enhanced Tools) +- [x] Interactive REPL (`shellforge chat`) — pair-programming with persistent conversation history +- [x] Color output (green prompt, red errors, yellow governance denials) +- [x] Shell escapes (`!command`) and Ctrl+C interrupt without session kill +- [x] Ollama (local) and Anthropic API provider support in REPL +- [x] Ralph Loop (`shellforge ralph`) — stateless-iterative multi-task execution +- [x] PICK → IMPLEMENT → VALIDATE → COMMIT → RESET cycle +- [x] Task input from JSON file or Octi Pulpo MCP dispatch +- [x] `--validate` flag for post-task test commands, `--dry-run` for preview +- [x] Sub-agent orchestrator — SpawnSync (block), SpawnAsync (fire and collect) +- [x] Concurrency control via semaphore, context compression (~750 tokens) +- [x] `edit_file` tool — targeted find-and-replace +- [x] `glob` tool — pattern-based file discovery with recursive `**` support +- [x] `grep` tool — regex content search with `file:line` output + --- ## In Progress @@ -142,17 +157,21 @@ Bugs identified during v0.6.x development. Fix before v1.0. --- -## Stack (as of v0.6.1) +## Stack (as of v0.8.0) | Component | Role | Status | |---|---|---| +| `shellforge chat` | Interactive REPL | Working | +| `shellforge ralph` | Multi-task loop | Working | +| `shellforge agent` | One-shot agent | Working | | Goose (Block) | Local model driver | Working | | Claude Code | API driver (Linux) | Working (via hooks) | | Copilot CLI | API driver (Linux) | Working (via hooks) | | Codex CLI | API driver (Linux) | Coming soon | | Gemini CLI | API driver (Linux) | Coming soon | | Ollama | Local inference | Working | +| Anthropic API | Cloud inference | Working (prompt caching) | | AgentGuard | Governance kernel | Working (YAML eval + Go kernel) | -| Dagu | Orchestration | Working (DAGs + web UI) | +| Octi Pulpo | Swarm coordination | Working (MCP) | | RTK | Token compression | Optional | | Docker | Sandbox | Optional |