Write contracts before writing code
Landing page · Book a setup call · Discussions
claude plugin marketplace add heurema/emporium
claude plugin install signum@emporiumAI can generate a function in seconds; telling you whether it is correct takes longer, because "correct" isn't defined until someone writes it down. Signum is a contract-first development pipeline for Claude Code that defines correctness before a line is written, then verifies against it deterministically — not by asking another model if the code looks right, but by running acceptance criteria the implementing agent never fully saw. Unlike generic code review, Signum produces a tamper-evident proofpack.json artifact that CI can gate on.
| Phase | What happens |
|---|---|
| CONTRACT | Spec graded A–F. Codex + Gemini validate for gaps. |
| EXECUTE | Engineer builds against a redacted contract. |
| AUDIT | Deterministic checks + 3-model parallel review + iterative fix loop. |
| PACK | Self-contained proofpack.json for CI gating. |
claude plugin marketplace add heurema/emporium
claude plugin install signum@emporiumManual install from source
git clone https://github.com/heurema/signum.git
cd signum
claude plugin install .# Run — describe what you want to build
/signum "your task description"Signum grades your spec, shows the contract for approval, implements with an automatic repair loop, audits from multiple angles, and produces proofpack.json.
| Command | Description |
|---|---|
/signum <task> |
Run the full CONTRACT → EXECUTE → AUDIT → PACK pipeline |
Spec quality gate — Before implementation starts, your spec is scored across seven dimensions: Testability, Negative coverage, Clarity, Scope boundedness, Completeness, Boundary cases, NL Consistency. Grade D (below 60) is a hard stop with specific feedback on what's missing. The gate runs on the specification, not the code.
Holdout scenarios — The Contractor generates hidden acceptance criteria the Engineer never sees. When implementation is complete, holdouts run against the result — blind testing for cases the agent couldn't optimize for. Verification uses a typed DSL with http, exec (whitelisted binaries only), and expect primitives — no shell execution, no eval. Minimum counts enforced by risk level: 0 for low, 2 for medium, 5 for high.
Project intent alignment — If the target project has a project.intent.md at its root, the contractor reads it before generating contracts. Non-goals and glossary terms flow into contract scope and terminology. For medium/high-risk tasks, missing project intent triggers a blocking question. An LLM-based alignment check warns when the contract diverges from project goals.
Glossary enforcement — A project.glossary.json at the project root defines canonical terms and forbidden synonyms. glossary_check scans contracts for alias usage, terminology_consistency_check detects synonym proliferation across active contracts. Both are WARN-only.
Cross-contract coherence — overlap_check detects inScope file overlap between active contracts. assumption_check flags contradictions in assumptions across related contracts. adr_check warns when relevant ADRs exist but aren't referenced. Contract lineage is tracked via parentContractId, relatedContractIds, and interfacesTouched.
Upstream staleness detection — Contractor computes SHA-256 over upstream artifacts (project.intent.md, project.glossary.json) at contract creation. staleness_check recomputes the hash at execution time. Configurable policy: warn (default) or block.
Within-task refinement — For medium/high-risk tasks, the contractor runs a 4-pass self-critique (ambiguity, missing-input, contradiction, coverage), capped at 2 auto-revision rounds. Typed findings and a readinessForPlanning gate are written to the contract.
Data-level blinding — The Engineer reads contract-engineer.json, not contract.json. Holdout scenarios are physically removed from the file — not hidden by instruction. The agent cannot infer them from context or structure.
Execution policy — contract-policy.json is derived from the contract before EXECUTE begins. It defines which tools the Engineer may use, which bash commands are denied, and which paths are in scope. Policy violations after execution are AUTO_BLOCK.
Repo invariant contracts — Add repo-contract.json to your project root — invariants that must always hold, independent of task. Any regression is AUTO_BLOCK, regardless of task-level acceptance criteria results.
{
"schemaVersion": "1.0",
"invariants": [
{ "id": "I-1", "description": "All tests pass", "verify": "pytest -q", "severity": "critical" },
{ "id": "I-2", "description": "No type errors", "verify": "mypy src/", "severity": "critical" },
{ "id": "I-3", "description": "No lint errors", "verify": "ruff check src/", "severity": "high" }
],
"owner": "human"
}Immutable audit chain — At user approval, Signum computes SHA-256 of the contract and records the timestamp. The base commit is captured before the Engineer runs. proofpack.json anchors the full chain: contract hash → approval timestamp → base commit → implementation diff → audit results.
Multi-model audit panel — Claude, Codex, and Gemini review the diff independently in parallel. The Mechanic runs first — deterministic checks: lint, typecheck, new test failures (by name, not exit code). Then models weigh in. Critical findings from any model block.
Iterative review-fix loop — When reviewers find MAJOR or CRITICAL issues, the AUDIT phase doesn't stop — it iterates. A fresh Engineer agent receives a repair brief with specific findings, fixes them, and the full review cycle re-runs from scratch. Best-of-N selection keeps the highest-scoring candidate across iterations. Early stop halts after 2 consecutive non-improving rounds. Up to 20 iterations (configurable via SIGNUM_AUDIT_MAX_ITERATIONS). Holdout details are never revealed to the engineer — only failure categories.
Diff progression — On review pass 2+, reviewers receive both the full feature diff and an iteration delta showing only what changed in the fix. This focuses review on the actual repair, reduces noise from re-discovering accepted code, and improves convergence speed.
Module lifecycle tracking — A modules.yaml manifest at the project root declares module status: active, experimental, deprecated, or removed. Deprecated modules carry remove_after deadlines and replaced_by pointers. The contractor reads this before generating contracts — cleanup tasks auto-detect removal candidates and generate structured removals and cleanupObligations entries.
Cleanup contracts — Contract schema v3.8 adds first-class support for code removal. removals entries specify files/directories to delete with preventReintroduction flags. cleanupObligations use K8s Finalizer semantics — blocking obligations (e.g., "remove all imports of deleted module") must be fulfilled before AUTO_OK. The DSL supports file_not_exists assertions and grep for reference-checking verify blocks. Evidence of successful removals is captured in proofpack.json.
┌─────────────────────────────────────────────────────────┐
│ PHASE 1: CONTRACT │
│ │
│ Contractor → spec quality gate (A–F) │
│ → prose check (lib/prose-check.sh) │
│ → glossary check (lib/glossary-check.sh) │
│ → terminology check (lib/terminology-check.sh)│
│ → overlap check (lib/overlap-check.sh) │
│ → assumption check (lib/assumption-check.sh) │
│ → ADR check (lib/adr-check.sh) │
│ → staleness check (lib/staleness-check.sh) │
│ → spec validation (Codex + Gemini, parallel) │
│ → holdout count gate (by risk level) │
│ → [user approval + contract-hash.txt] │
│ → contract-engineer.json (holdouts removed) │
│ → contract-policy.json (execution rules) │
└───────────────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PHASE 2: EXECUTE │
│ │
│ baseline (lint, typecheck, per-test failing names) │
│ + repo-contract baseline │
│ → Engineer (no holdouts, policy-constrained) │
│ → scope gate → policy compliance check │
└───────────────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PHASE 3: AUDIT │
│ │
│ repo-contract invariant check │
│ → Mechanic (deterministic, zero LLM) │
│ → Claude + Codex + Gemini (parallel, independent) │
│ → holdout verification │
│ → Synthesizer (verdict + confidence score) │
│ → if MAJOR/CRITICAL: iterative fix loop ───────┐ │
│ engineer fixes → full re-review → repeat │ │
│ best-of-N selection, up to 20 iterations │ │
│ diff progression: full patch + delta ◄─────┘ │
└───────────────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PHASE 4: PACK │
│ │
│ all artifacts embedded → self-contained proofpack.json │
└─────────────────────────────────────────────────────────┘
Create ~/.claude/emporium-providers.local.md to customize which models Codex and Gemini use:
---
version: 1
defaults:
codex:
model: "gpt-5.3-codex"
gemini:
model: "gemini-3.1-pro"
routing:
review:
gemini: "gemini-3-flash"
---Without this file, signum uses each CLI's default model. See forge doctor to validate your config.
- Claude Code v2.1+
git,jq,python3- Optional: Codex CLI, Gemini CLI
All orchestration runs inside Claude Code. External providers (Codex CLI, Gemini CLI) receive the diff only — never the full codebase. Signum degrades gracefully if either is unavailable. No API keys required beyond standard CLI auth. No telemetry. Artifacts stored in .signum/ (auto-added to .gitignore).
- skill7.dev/signum — landing page, setup call, email updates
- heurema/emporium — plugin registry
- How it works — agents, trust boundaries, limitations
- Reference — artifacts schema, troubleshooting
- Report an issue