-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add CI workflow, CONTRIBUTING.md, CHANGELOG.md, and README badges/demo #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| name: CI | ||
|
|
||
| on: | ||
| push: | ||
| branches: ["master"] | ||
| pull_request: | ||
| branches: ["master"] | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| eval: | ||
| name: Eval Suite | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Node.js | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "20" | ||
| cache: "npm" | ||
| cache-dependency-path: evals/package-lock.json | ||
|
|
||
| - name: Install eval dependencies | ||
| run: cd evals && npm ci | ||
|
|
||
| - name: Run evals | ||
| run: cd evals && npm test | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,68 @@ | ||||||||||||||||||||||||||||||||||||||||||
| # Changelog | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| All notable changes to ControlFlow are documented here. | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| --- | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| ## [1.0.0] — 2026-04-15 | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| ### Added | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| **Agent system (13 agents)** | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| - `Orchestrator` — conductor, gate controller, wave-based parallel dispatch, failure routing | ||||||||||||||||||||||||||||||||||||||||||
| - `Planner` — structured planning with idea interview, phased plans, Mermaid diagrams, semantic risk discovery across 7 non-functional risk categories | ||||||||||||||||||||||||||||||||||||||||||
| - `PlanAuditor` — adversarial plan audit, architecture and risk review | ||||||||||||||||||||||||||||||||||||||||||
| - `AssumptionVerifier` — assumption-fact confusion detection, mirage elimination | ||||||||||||||||||||||||||||||||||||||||||
| - `ExecutabilityVerifier` — cold-start plan executability simulation | ||||||||||||||||||||||||||||||||||||||||||
| - `CoreImplementer` — backend implementation with TDD enforcement | ||||||||||||||||||||||||||||||||||||||||||
| - `UIImplementer` — frontend implementation | ||||||||||||||||||||||||||||||||||||||||||
| - `PlatformEngineer` — CI/CD, containers, infrastructure, rollback contracts | ||||||||||||||||||||||||||||||||||||||||||
| - `CodeReviewer` — code review, safety gates, verdict contracts | ||||||||||||||||||||||||||||||||||||||||||
| - `Researcher` — evidence-first research with confidence scores and citations | ||||||||||||||||||||||||||||||||||||||||||
| - `CodeMapper` — read-only codebase discovery | ||||||||||||||||||||||||||||||||||||||||||
| - `TechnicalWriter` — documentation, diagrams, code-doc parity enforcement | ||||||||||||||||||||||||||||||||||||||||||
| - `BrowserTester` — E2E browser testing with health-first verification and accessibility audits | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| **Architecture** | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| - P.A.R.T contract architecture (Prompt → Archive → Resources → Tools) enforced across all agents | ||||||||||||||||||||||||||||||||||||||||||
| - Structured text outputs replacing raw JSON to conserve context tokens in delegation chains | ||||||||||||||||||||||||||||||||||||||||||
| - Wave-based parallel execution — Orchestrator dispatches independent phases in parallel | ||||||||||||||||||||||||||||||||||||||||||
| - Adversarial review pipeline — up to three independent reviewers before implementation (depth scales with complexity tier: TRIVIAL / SMALL / MEDIUM / LARGE) | ||||||||||||||||||||||||||||||||||||||||||
| - Failure taxonomy (`transient` / `fixable` / `needs_replan` / `escalate`) with deterministic retry and escalation routing | ||||||||||||||||||||||||||||||||||||||||||
| - Least-privilege tool grants — each agent's `tools:` frontmatter trimmed to minimum required by role | ||||||||||||||||||||||||||||||||||||||||||
| - Semantic risk discovery — 7 non-functional risk categories evaluated before research delegation | ||||||||||||||||||||||||||||||||||||||||||
| - Batch approval per execution wave, per-phase approval for destructive operations | ||||||||||||||||||||||||||||||||||||||||||
| - `NEEDS_INPUT` clarification routing from subagents through Orchestrator to user via `askQuestions` | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| **Governance and contracts** | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| - JSON Schema contracts for all agent outputs in `schemas/` | ||||||||||||||||||||||||||||||||||||||||||
| - Governance policies in `docs/agent-engineering/`: PART-SPEC, RELIABILITY-GATES, CLARIFICATION-POLICY, TOOL-ROUTING, SCORING-SPEC, MIGRATION-CORE-FIRST, PROMPT-BEHAVIOR-CONTRACT | ||||||||||||||||||||||||||||||||||||||||||
| - Canonical tool grants in `governance/agent-grants.json` | ||||||||||||||||||||||||||||||||||||||||||
| - Agent roster and complexity tier definitions in `plans/project-context.md` | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| **Skill library** | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| - 7 domain-specific skill patterns: Testing, Error Handling, Security, Performance, Completeness, Integration, Idea-to-Prompt | ||||||||||||||||||||||||||||||||||||||||||
| - Skill index at `skills/index.md` | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| **Eval suite (302 checks)** | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| - Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12) | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 2–3: Scenario integrity and cross-scenario structural regression (179 structural checks) | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 4: P.A.R.T section order enforcement | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 4b: Clarification trigger and tool routing section validation | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 5: Skill library registration integrity | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 6: Synthetic rename negative-path checks | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 7: Prompt behavior contract behavioral regression (74 checks across 9 agents) | ||||||||||||||||||||||||||||||||||||||||||
| - Pass 8: Orchestration handoff contract regression (49 checks) | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+53
to
+62
|
||||||||||||||||||||||||||||||||||||||||||
| **Eval suite (302 checks)** | |
| - Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12) | |
| - Pass 2–3: Scenario integrity and cross-scenario structural regression (179 structural checks) | |
| - Pass 4: P.A.R.T section order enforcement | |
| - Pass 4b: Clarification trigger and tool routing section validation | |
| - Pass 5: Skill library registration integrity | |
| - Pass 6: Synthetic rename negative-path checks | |
| - Pass 7: Prompt behavior contract behavioral regression (74 checks across 9 agents) | |
| - Pass 8: Orchestration handoff contract regression (49 checks) | |
| **Eval suite** | |
| - Pass 1: Schema validity (Ajv strict mode, JSON Schema 2020-12) | |
| - Pass 2–3: Scenario integrity and cross-scenario structural regression | |
| - Pass 4: P.A.R.T section order enforcement | |
| - Pass 4b: Clarification trigger and tool routing section validation | |
| - Pass 5: Skill library registration integrity | |
| - Pass 6: Synthetic rename negative-path checks | |
| - Pass 7: Prompt behavior contract behavioral regression across agent prompts | |
| - Pass 8: Orchestration handoff contract regression |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,105 @@ | ||||||
| # Contributing to ControlFlow | ||||||
|
|
||||||
| Thank you for your interest in contributing! This guide covers the key contribution paths. | ||||||
|
|
||||||
| ## Table of Contents | ||||||
|
|
||||||
| - [Running the eval suite](#running-the-eval-suite) | ||||||
| - [Adding a new agent](#adding-a-new-agent) | ||||||
| - [Editing an existing agent](#editing-an-existing-agent) | ||||||
| - [Adding skills](#adding-skills) | ||||||
| - [Proposing changes](#proposing-changes) | ||||||
| - [Code of conduct](#code-of-conduct) | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Running the eval suite | ||||||
|
|
||||||
| The eval suite validates schema compliance, P.A.R.T contract structure, tool grant consistency, behavioral invariants, and orchestration handoff discipline across all 13 agents — without invoking live agents. | ||||||
|
|
||||||
| ```bash | ||||||
| cd evals | ||||||
| npm install | ||||||
|
||||||
| npm install | |
| npm ci |
Copilot
AI
Apr 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section states “All 302 checks must pass…”, but evals/README.md currently documents a different total (283). To avoid documentation drift, consider removing the hardcoded number (e.g., “All eval checks must pass”) or updating both files to match the authoritative count.
| All 302 checks must pass before any PR can be merged. The suite runs fully offline. | |
| All eval checks must pass before any PR can be merged. The suite runs fully offline. |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,7 +1,29 @@ | ||||||
| # ControlFlow | ||||||
|
|
||||||
| [](https://github.com/Smithbox-ai/ControlFlow/actions/workflows/ci.yml) | ||||||
|  | ||||||
|  | ||||||
|
||||||
|  | |
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow is only triggered for the
masterbranch. The repo/diff context suggests the default branch may bemain, in which case CI won’t run for pushes/PRs to the default branch and the README CI badge may show no results. Consider triggering on bothmainandmaster, or updating the branch filter to match the repo’s actual default branch.