diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..d4d765b --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,31 @@ +name: CI + +on: + push: + branches: ["master"] + pull_request: + branches: ["master"] + +permissions: + contents: read + +jobs: + eval: + name: Eval Suite + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: "20" + cache: "npm" + cache-dependency-path: evals/package-lock.json + + - name: Install eval dependencies + run: cd evals && npm ci + + - name: Run evals + run: cd evals && npm test diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..8f5b66b --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,68 @@ +# Changelog + +All notable changes to ControlFlow are documented here. + +The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). + +--- + +## [1.0.0] — 2026-04-15 + +### Added + +**Agent system (13 agents)** + +- `Orchestrator` — conductor, gate controller, wave-based parallel dispatch, failure routing +- `Planner` — structured planning with idea interview, phased plans, Mermaid diagrams, semantic risk discovery across 7 non-functional risk categories +- `PlanAuditor` — adversarial plan audit, architecture and risk review +- `AssumptionVerifier` — assumption-fact confusion detection, mirage elimination +- `ExecutabilityVerifier` — cold-start plan executability simulation +- `CoreImplementer` — backend implementation with TDD enforcement +- `UIImplementer` — frontend implementation +- `PlatformEngineer` — CI/CD, containers, infrastructure, rollback contracts +- `CodeReviewer` — code review, safety gates, verdict contracts +- `Researcher` — evidence-first research with confidence scores and citations +- `CodeMapper` — read-only codebase discovery +- `TechnicalWriter` — documentation, diagrams, code-doc parity enforcement +- `BrowserTester` — E2E browser testing with health-first verification and accessibility audits + +**Architecture** + +- P.A.R.T contract architecture (Prompt → Archive → Resources → Tools) enforced across all agents +- Structured text outputs replacing raw JSON to conserve context tokens in delegation chains +- Wave-based parallel execution — Orchestrator dispatches independent phases in parallel +- Adversarial review pipeline — up to three independent reviewers before implementation (depth scales with complexity tier: TRIVIAL / SMALL / MEDIUM / LARGE) +- Failure taxonomy (`transient` / `fixable` / `needs_replan` / `escalate`) with deterministic retry and escalation routing +- Least-privilege tool grants — each agent's `tools:` frontmatter trimmed to minimum required by role +- Semantic risk discovery — 7 non-functional risk categories evaluated before research delegation +- Batch approval per execution wave, per-phase approval for destructive operations +- `NEEDS_INPUT` clarification routing from subagents through Orchestrator to user via `askQuestions` + +**Governance and contracts** + +- JSON Schema contracts for all agent outputs in `schemas/` +- Governance policies in `docs/agent-engineering/`: PART-SPEC, RELIABILITY-GATES, CLARIFICATION-POLICY, TOOL-ROUTING, SCORING-SPEC, MIGRATION-CORE-FIRST, PROMPT-BEHAVIOR-CONTRACT +- Canonical tool grants in `governance/agent-grants.json` +- Agent roster and complexity tier definitions in `plans/project-context.md` + +**Skill library** + +- 7 domain-specific skill patterns: Testing, Error Handling, Security, Performance, Completeness, Integration, Idea-to-Prompt +- Skill index at `skills/index.md` + +**Eval suite (302 checks)** + +- Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12) +- Pass 2–3: Scenario integrity and cross-scenario structural regression (179 structural checks) +- Pass 4: P.A.R.T section order enforcement +- Pass 4b: Clarification trigger and tool routing section validation +- Pass 5: Skill library registration integrity +- Pass 6: Synthetic rename negative-path checks +- Pass 7: Prompt behavior contract behavioral regression (74 checks across 9 agents) +- Pass 8: Orchestration handoff contract regression (49 checks) +- F7/F8: Complexity tier and reference integrity enforcement +- Warm cache for fast repeated structural runs + +**CI** + +- GitHub Actions workflow running the full eval suite on every push and pull request to `master` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..b1b47f7 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,105 @@ +# Contributing to ControlFlow + +Thank you for your interest in contributing! This guide covers the key contribution paths. + +## Table of Contents + +- [Running the eval suite](#running-the-eval-suite) +- [Adding a new agent](#adding-a-new-agent) +- [Editing an existing agent](#editing-an-existing-agent) +- [Adding skills](#adding-skills) +- [Proposing changes](#proposing-changes) +- [Code of conduct](#code-of-conduct) + +--- + +## Running the eval suite + +The eval suite validates schema compliance, P.A.R.T contract structure, tool grant consistency, behavioral invariants, and orchestration handoff discipline across all 13 agents — without invoking live agents. + +```bash +cd evals +npm install +npm test +``` + +All 302 checks must pass before any PR can be merged. The suite runs fully offline. + +For a faster structural-only pass: + +```bash +npm run test:structural +``` + +For behavioral and orchestration regressions only: + +```bash +npm run test:behavior +``` + +--- + +## Adding a new agent + +1. **Create the agent file** at repo root: `.agent.md` or `-subagent.agent.md`. + +2. **Follow P.A.R.T structure** — every agent file must have exactly these top-level sections in order: + - `## Prompt` — mission, scope, deterministic output contracts, Non-Negotiable Rules + - `## Archive` — memory policies, context compaction rules + - `## Resources` — file references loaded on-demand + - `## Tools` — allowed/disallowed tools with routing rules + + See `docs/agent-engineering/PART-SPEC.md` for the full specification. + +3. **Create a JSON Schema contract** in `schemas/-output.schema.json`. Schema files serve as documentation contracts and eval references. + +4. **Add eval scenarios** in `evals/scenarios/` that cover: + - At least one happy-path execution + - `ABSTAIN` / `NEEDS_INPUT` / failure classification behavior + - Tool routing compliance if the agent uses external tools + +5. **Register the agent in governance files**: + - Add it to `governance/agent-grants.json` with its canonical tool grants. + - Add it to `plans/project-context.md` (agent roster table). + +6. **Update `README.md`**: + - Add a row to the appropriate agent table (Primary Agents or Specialized Subagents). + - Update the agent count badge if you bump past 13. + +7. **Run the full eval suite** and fix any failures before opening a PR. + +--- + +## Editing an existing agent + +1. Read the current agent file carefully. Understand the Non-Negotiable Rules, clarification contract, and tool routing section before making changes. +2. Run `cd evals && npm test` **before and after** your edit to confirm no regressions. +3. If you change output contracts (status values, required fields), update the corresponding schema in `schemas/` and any eval scenarios that assert those fields. +4. If you change tool grants in frontmatter, update `governance/agent-grants.json` to match — the eval suite enforces consistency between the two. + +--- + +## Adding skills + +Skills are reusable domain pattern snippets that Planner selects per phase and implementation agents load at execution time. They live in `skills/patterns/*.md`. + +1. Create `skills/patterns/.md` following the style of existing patterns. +2. Register the new file in `skills/index.md`. +3. Run `npm test` — Pass 5 validates that every `skills/patterns/` file is registered in the index and every index entry resolves to a real file. + +--- + +## Proposing changes + +- **Bug reports and feature requests:** Open a GitHub Issue describing the problem or proposal clearly. +- **Pull requests:** Fork the repository, create a feature branch, and open a PR against `master`. + - Every PR must pass `cd evals && npm test`. + - Describe what you changed and why in the PR description. + - Reference any related Issues. +- **Breaking changes:** Changes to shared governance files (`governance/`, `schemas/`, `.github/copilot-instructions.md`) affect all agents — test thoroughly and call this out explicitly in the PR description. + +--- + +## Code of conduct + +Be respectful and constructive. This project follows the [Contributor Covenant](https://www.contributor-covenant.org/) v2.1. diff --git a/README.md b/README.md index a26c10f..72ee313 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,29 @@ # ControlFlow +[![CI](https://github.com/Smithbox-ai/ControlFlow/actions/workflows/ci.yml/badge.svg)](https://github.com/Smithbox-ai/ControlFlow/actions/workflows/ci.yml) +![Agents](https://img.shields.io/badge/agents-13-blue) +![Eval Checks](https://img.shields.io/badge/eval%20checks-302-brightgreen) +![License](https://img.shields.io/badge/license-MIT-green) + A multi-agent orchestration system for VS Code Copilot. ControlFlow replaces single-agent workflows with a coordinated team of 13 specialized agents governed by deterministic **P.A.R.T contracts** (Prompt → Archive → Resources → Tools), structured text outputs, and reliability gates. +## How It Works + +**Turn any vague idea into working code in three steps:** + +``` +1. @Planner "Add OAuth login with Google" + → Idea interview → phased plan → Mermaid architecture diagram + +2. Approve the plan + +3. @Orchestrator (runs automatically) + → PlanAuditor reviews → CoreImplementer + TechnicalWriter execute in parallel + → CodeReviewer gates each phase → done +``` + +Each agent operates within strict P.A.R.T contracts — deterministic status outputs, least-privilege tool grants, and explicit failure classification — so you get predictable, auditable results instead of unpredictable single-agent sprawl. + ## Key Features - **Context-Efficient Output** — agents return structured text summaries instead of raw JSON, conserving context tokens across delegation chains.