Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: CI

on:
push:
branches: ["master"]
pull_request:
branches: ["master"]
Comment on lines +5 to +7
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow is only triggered for the master branch. The repo/diff context suggests the default branch may be main, in which case CI won’t run for pushes/PRs to the default branch and the README CI badge may show no results. Consider triggering on both main and master, or updating the branch filter to match the repo’s actual default branch.

Suggested change
branches: ["master"]
pull_request:
branches: ["master"]
branches: ["main", "master"]
pull_request:
branches: ["main", "master"]

Copilot uses AI. Check for mistakes.

permissions:
contents: read

jobs:
eval:
name: Eval Suite
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
cache-dependency-path: evals/package-lock.json

- name: Install eval dependencies
run: cd evals && npm ci

- name: Run evals
run: cd evals && npm test
68 changes: 68 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Changelog

All notable changes to ControlFlow are documented here.

The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

---

## [1.0.0] — 2026-04-15

### Added

**Agent system (13 agents)**

- `Orchestrator` — conductor, gate controller, wave-based parallel dispatch, failure routing
- `Planner` — structured planning with idea interview, phased plans, Mermaid diagrams, semantic risk discovery across 7 non-functional risk categories
- `PlanAuditor` — adversarial plan audit, architecture and risk review
- `AssumptionVerifier` — assumption-fact confusion detection, mirage elimination
- `ExecutabilityVerifier` — cold-start plan executability simulation
- `CoreImplementer` — backend implementation with TDD enforcement
- `UIImplementer` — frontend implementation
- `PlatformEngineer` — CI/CD, containers, infrastructure, rollback contracts
- `CodeReviewer` — code review, safety gates, verdict contracts
- `Researcher` — evidence-first research with confidence scores and citations
- `CodeMapper` — read-only codebase discovery
- `TechnicalWriter` — documentation, diagrams, code-doc parity enforcement
- `BrowserTester` — E2E browser testing with health-first verification and accessibility audits

**Architecture**

- P.A.R.T contract architecture (Prompt → Archive → Resources → Tools) enforced across all agents
- Structured text outputs replacing raw JSON to conserve context tokens in delegation chains
- Wave-based parallel execution — Orchestrator dispatches independent phases in parallel
- Adversarial review pipeline — up to three independent reviewers before implementation (depth scales with complexity tier: TRIVIAL / SMALL / MEDIUM / LARGE)
- Failure taxonomy (`transient` / `fixable` / `needs_replan` / `escalate`) with deterministic retry and escalation routing
- Least-privilege tool grants — each agent's `tools:` frontmatter trimmed to minimum required by role
- Semantic risk discovery — 7 non-functional risk categories evaluated before research delegation
- Batch approval per execution wave, per-phase approval for destructive operations
- `NEEDS_INPUT` clarification routing from subagents through Orchestrator to user via `askQuestions`

**Governance and contracts**

- JSON Schema contracts for all agent outputs in `schemas/`
- Governance policies in `docs/agent-engineering/`: PART-SPEC, RELIABILITY-GATES, CLARIFICATION-POLICY, TOOL-ROUTING, SCORING-SPEC, MIGRATION-CORE-FIRST, PROMPT-BEHAVIOR-CONTRACT
- Canonical tool grants in `governance/agent-grants.json`
- Agent roster and complexity tier definitions in `plans/project-context.md`

**Skill library**

- 7 domain-specific skill patterns: Testing, Error Handling, Security, Performance, Completeness, Integration, Idea-to-Prompt
- Skill index at `skills/index.md`

**Eval suite (302 checks)**

- Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12)
- Pass 2–3: Scenario integrity and cross-scenario structural regression (179 structural checks)
- Pass 4: P.A.R.T section order enforcement
- Pass 4b: Clarification trigger and tool routing section validation
- Pass 5: Skill library registration integrity
- Pass 6: Synthetic rename negative-path checks
- Pass 7: Prompt behavior contract behavioral regression (74 checks across 9 agents)
- Pass 8: Orchestration handoff contract regression (49 checks)
Comment on lines +53 to +62
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog claims the eval suite has “302 checks” with a specific breakdown, but evals/README.md currently documents a different total. Consider either aligning these numbers with the authoritative source or avoiding fixed counts in the changelog to prevent staleness.

Suggested change
**Eval suite (302 checks)**
- Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12)
- Pass 2–3: Scenario integrity and cross-scenario structural regression (179 structural checks)
- Pass 4: P.A.R.T section order enforcement
- Pass 4b: Clarification trigger and tool routing section validation
- Pass 5: Skill library registration integrity
- Pass 6: Synthetic rename negative-path checks
- Pass 7: Prompt behavior contract behavioral regression (74 checks across 9 agents)
- Pass 8: Orchestration handoff contract regression (49 checks)
**Eval suite**
- Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12)
- Pass 2–3: Scenario integrity and cross-scenario structural regression
- Pass 4: P.A.R.T section order enforcement
- Pass 4b: Clarification trigger and tool routing section validation
- Pass 5: Skill library registration integrity
- Pass 6: Synthetic rename negative-path checks
- Pass 7: Prompt behavior contract behavioral regression across agent prompts
- Pass 8: Orchestration handoff contract regression

Copilot uses AI. Check for mistakes.
- F7/F8: Complexity tier and reference integrity enforcement
- Warm cache for fast repeated structural runs

**CI**

- GitHub Actions workflow running the full eval suite on every push and pull request to `master`
105 changes: 105 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Contributing to ControlFlow

Thank you for your interest in contributing! This guide covers the key contribution paths.

## Table of Contents

- [Running the eval suite](#running-the-eval-suite)
- [Adding a new agent](#adding-a-new-agent)
- [Editing an existing agent](#editing-an-existing-agent)
- [Adding skills](#adding-skills)
- [Proposing changes](#proposing-changes)
- [Code of conduct](#code-of-conduct)

---

## Running the eval suite

The eval suite validates schema compliance, P.A.R.T contract structure, tool grant consistency, behavioral invariants, and orchestration handoff discipline across all 13 agents — without invoking live agents.

```bash
cd evals
npm install
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contributor instructions use npm install, while CI uses npm ci. Using npm ci locally (with the committed lockfile) better matches CI’s deterministic dependency resolution and reduces “works locally but not in CI” issues.

Suggested change
npm install
npm ci

Copilot uses AI. Check for mistakes.
npm test
```

All 302 checks must pass before any PR can be merged. The suite runs fully offline.
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section states “All 302 checks must pass…”, but evals/README.md currently documents a different total (283). To avoid documentation drift, consider removing the hardcoded number (e.g., “All eval checks must pass”) or updating both files to match the authoritative count.

Suggested change
All 302 checks must pass before any PR can be merged. The suite runs fully offline.
All eval checks must pass before any PR can be merged. The suite runs fully offline.

Copilot uses AI. Check for mistakes.

For a faster structural-only pass:

```bash
npm run test:structural
```

For behavioral and orchestration regressions only:

```bash
npm run test:behavior
```

---

## Adding a new agent

1. **Create the agent file** at repo root: `<Name>.agent.md` or `<Name>-subagent.agent.md`.

2. **Follow P.A.R.T structure** — every agent file must have exactly these top-level sections in order:
- `## Prompt` — mission, scope, deterministic output contracts, Non-Negotiable Rules
- `## Archive` — memory policies, context compaction rules
- `## Resources` — file references loaded on-demand
- `## Tools` — allowed/disallowed tools with routing rules

See `docs/agent-engineering/PART-SPEC.md` for the full specification.

3. **Create a JSON Schema contract** in `schemas/<name>-output.schema.json`. Schema files serve as documentation contracts and eval references.

4. **Add eval scenarios** in `evals/scenarios/` that cover:
- At least one happy-path execution
- `ABSTAIN` / `NEEDS_INPUT` / failure classification behavior
- Tool routing compliance if the agent uses external tools

5. **Register the agent in governance files**:
- Add it to `governance/agent-grants.json` with its canonical tool grants.
- Add it to `plans/project-context.md` (agent roster table).

6. **Update `README.md`**:
- Add a row to the appropriate agent table (Primary Agents or Specialized Subagents).
- Update the agent count badge if you bump past 13.

7. **Run the full eval suite** and fix any failures before opening a PR.

---

## Editing an existing agent

1. Read the current agent file carefully. Understand the Non-Negotiable Rules, clarification contract, and tool routing section before making changes.
2. Run `cd evals && npm test` **before and after** your edit to confirm no regressions.
3. If you change output contracts (status values, required fields), update the corresponding schema in `schemas/` and any eval scenarios that assert those fields.
4. If you change tool grants in frontmatter, update `governance/agent-grants.json` to match — the eval suite enforces consistency between the two.

---

## Adding skills

Skills are reusable domain pattern snippets that Planner selects per phase and implementation agents load at execution time. They live in `skills/patterns/*.md`.

1. Create `skills/patterns/<topic>.md` following the style of existing patterns.
2. Register the new file in `skills/index.md`.
3. Run `npm test` — Pass 5 validates that every `skills/patterns/` file is registered in the index and every index entry resolves to a real file.

---

## Proposing changes

- **Bug reports and feature requests:** Open a GitHub Issue describing the problem or proposal clearly.
- **Pull requests:** Fork the repository, create a feature branch, and open a PR against `master`.
- Every PR must pass `cd evals && npm test`.
- Describe what you changed and why in the PR description.
- Reference any related Issues.
- **Breaking changes:** Changes to shared governance files (`governance/`, `schemas/`, `.github/copilot-instructions.md`) affect all agents — test thoroughly and call this out explicitly in the PR description.

---

## Code of conduct

Be respectful and constructive. This project follows the [Contributor Covenant](https://www.contributor-covenant.org/) v2.1.
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
# ControlFlow

[![CI](https://github.com/Smithbox-ai/ControlFlow/actions/workflows/ci.yml/badge.svg)](https://github.com/Smithbox-ai/ControlFlow/actions/workflows/ci.yml)
![Agents](https://img.shields.io/badge/agents-13-blue)
![Eval Checks](https://img.shields.io/badge/eval%20checks-302-brightgreen)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Eval Checks” badge hardcodes 302, but the repository’s evals/README.md currently states a different total (283). Hardcoding the number risks the badge and docs drifting out of sync; consider removing the fixed count or deriving it from a single authoritative source and updating all references together.

Suggested change
![Eval Checks](https://img.shields.io/badge/eval%20checks-302-brightgreen)
![Eval Checks](https://img.shields.io/badge/eval%20checks-passing-brightgreen)

Copilot uses AI. Check for mistakes.
![License](https://img.shields.io/badge/license-MIT-green)

A multi-agent orchestration system for VS Code Copilot. ControlFlow replaces single-agent workflows with a coordinated team of 13 specialized agents governed by deterministic **P.A.R.T contracts** (Prompt → Archive → Resources → Tools), structured text outputs, and reliability gates.

## How It Works

**Turn any vague idea into working code in three steps:**

```
1. @Planner "Add OAuth login with Google"
→ Idea interview → phased plan → Mermaid architecture diagram

2. Approve the plan

3. @Orchestrator (runs automatically)
→ PlanAuditor reviews → CoreImplementer + TechnicalWriter execute in parallel
→ CodeReviewer gates each phase → done
```

Each agent operates within strict P.A.R.T contracts — deterministic status outputs, least-privilege tool grants, and explicit failure classification — so you get predictable, auditable results instead of unpredictable single-agent sprawl.

## Key Features

- **Context-Efficient Output** — agents return structured text summaries instead of raw JSON, conserving context tokens across delegation chains.
Expand Down
Loading