Skip to content

heurema/signum

Repository files navigation

Signum

Write contracts before writing code

Claude Code Plugin Version License

Landing page · Book a setup call · Discussions

claude plugin marketplace add heurema/emporium
claude plugin install signum@emporium

What it does

AI can generate a function in seconds; telling you whether it is correct takes longer, because "correct" isn't defined until someone writes it down. Signum is a contract-first development pipeline for Claude Code that defines correctness before a line is written, then verifies against it deterministically — not by asking another model if the code looks right, but by running acceptance criteria the implementing agent never fully saw. Unlike generic code review, Signum produces a tamper-evident proofpack.json artifact that CI can gate on.

Phase What happens
CONTRACT Spec graded A–F. Codex + Gemini validate for gaps.
EXECUTE Engineer builds against a redacted contract.
AUDIT Deterministic checks + 3-model parallel review + iterative fix loop.
PACK Self-contained proofpack.json for CI gating.

Install

claude plugin marketplace add heurema/emporium
claude plugin install signum@emporium
Manual install from source
git clone https://github.com/heurema/signum.git
cd signum
claude plugin install .

Quick start

# Run — describe what you want to build
/signum "your task description"

Signum grades your spec, shows the contract for approval, implements with an automatic repair loop, audits from multiple angles, and produces proofpack.json.

Commands

Command Description
/signum <task> Run the full CONTRACT → EXECUTE → AUDIT → PACK pipeline

Features

Spec quality gate — Before implementation starts, your spec is scored across seven dimensions: Testability, Negative coverage, Clarity, Scope boundedness, Completeness, Boundary cases, NL Consistency. Grade D (below 60) is a hard stop with specific feedback on what's missing. The gate runs on the specification, not the code.

Holdout scenarios — The Contractor generates hidden acceptance criteria the Engineer never sees. When implementation is complete, holdouts run against the result — blind testing for cases the agent couldn't optimize for. Verification uses a typed DSL with http, exec (whitelisted binaries only), and expect primitives — no shell execution, no eval. Minimum counts enforced by risk level: 0 for low, 2 for medium, 5 for high.

Project intent alignment — If the target project has a project.intent.md at its root, the contractor reads it before generating contracts. Non-goals and glossary terms flow into contract scope and terminology. For medium/high-risk tasks, missing project intent triggers a blocking question. An LLM-based alignment check warns when the contract diverges from project goals.

Glossary enforcement — A project.glossary.json at the project root defines canonical terms and forbidden synonyms. glossary_check scans contracts for alias usage, terminology_consistency_check detects synonym proliferation across active contracts. Both are WARN-only.

Cross-contract coherenceoverlap_check detects inScope file overlap between active contracts. assumption_check flags contradictions in assumptions across related contracts. adr_check warns when relevant ADRs exist but aren't referenced. Contract lineage is tracked via parentContractId, relatedContractIds, and interfacesTouched.

Upstream staleness detection — Contractor computes SHA-256 over upstream artifacts (project.intent.md, project.glossary.json) at contract creation. staleness_check recomputes the hash at execution time. Configurable policy: warn (default) or block.

Within-task refinement — For medium/high-risk tasks, the contractor runs a 4-pass self-critique (ambiguity, missing-input, contradiction, coverage), capped at 2 auto-revision rounds. Typed findings and a readinessForPlanning gate are written to the contract.

Data-level blinding — The Engineer reads contract-engineer.json, not contract.json. Holdout scenarios are physically removed from the file — not hidden by instruction. The agent cannot infer them from context or structure.

Execution policycontract-policy.json is derived from the contract before EXECUTE begins. It defines which tools the Engineer may use, which bash commands are denied, and which paths are in scope. Policy violations after execution are AUTO_BLOCK.

Repo invariant contracts — Add repo-contract.json to your project root — invariants that must always hold, independent of task. Any regression is AUTO_BLOCK, regardless of task-level acceptance criteria results.

{
  "schemaVersion": "1.0",
  "invariants": [
    { "id": "I-1", "description": "All tests pass", "verify": "pytest -q", "severity": "critical" },
    { "id": "I-2", "description": "No type errors", "verify": "mypy src/", "severity": "critical" },
    { "id": "I-3", "description": "No lint errors", "verify": "ruff check src/", "severity": "high" }
  ],
  "owner": "human"
}

Immutable audit chain — At user approval, Signum computes SHA-256 of the contract and records the timestamp. The base commit is captured before the Engineer runs. proofpack.json anchors the full chain: contract hash → approval timestamp → base commit → implementation diff → audit results.

Multi-model audit panel — Claude, Codex, and Gemini review the diff independently in parallel. The Mechanic runs first — deterministic checks: lint, typecheck, new test failures (by name, not exit code). Then models weigh in. Critical findings from any model block.

Iterative review-fix loop — When reviewers find MAJOR or CRITICAL issues, the AUDIT phase doesn't stop — it iterates. A fresh Engineer agent receives a repair brief with specific findings, fixes them, and the full review cycle re-runs from scratch. Best-of-N selection keeps the highest-scoring candidate across iterations. Early stop halts after 2 consecutive non-improving rounds. Up to 20 iterations (configurable via SIGNUM_AUDIT_MAX_ITERATIONS). Holdout details are never revealed to the engineer — only failure categories.

Diff progression — On review pass 2+, reviewers receive both the full feature diff and an iteration delta showing only what changed in the fix. This focuses review on the actual repair, reduces noise from re-discovering accepted code, and improves convergence speed.

Module lifecycle tracking — A modules.yaml manifest at the project root declares module status: active, experimental, deprecated, or removed. Deprecated modules carry remove_after deadlines and replaced_by pointers. The contractor reads this before generating contracts — cleanup tasks auto-detect removal candidates and generate structured removals and cleanupObligations entries.

Cleanup contracts — Contract schema v3.8 adds first-class support for code removal. removals entries specify files/directories to delete with preventReintroduction flags. cleanupObligations use K8s Finalizer semantics — blocking obligations (e.g., "remove all imports of deleted module") must be fulfilled before AUTO_OK. The DSL supports file_not_exists assertions and grep for reference-checking verify blocks. Evidence of successful removals is captured in proofpack.json.

Architecture

┌─────────────────────────────────────────────────────────┐
│  PHASE 1: CONTRACT                                      │
│                                                         │
│  Contractor → spec quality gate (A–F)                  │
│            → prose check (lib/prose-check.sh)           │
│            → glossary check (lib/glossary-check.sh)     │
│            → terminology check (lib/terminology-check.sh)│
│            → overlap check (lib/overlap-check.sh)       │
│            → assumption check (lib/assumption-check.sh) │
│            → ADR check (lib/adr-check.sh)               │
│            → staleness check (lib/staleness-check.sh)   │
│            → spec validation (Codex + Gemini, parallel) │
│            → holdout count gate (by risk level)         │
│            → [user approval + contract-hash.txt]        │
│            → contract-engineer.json  (holdouts removed) │
│            → contract-policy.json    (execution rules)  │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│  PHASE 2: EXECUTE                                       │
│                                                         │
│  baseline (lint, typecheck, per-test failing names)     │
│  + repo-contract baseline                               │
│  → Engineer (no holdouts, policy-constrained)           │
│  → scope gate → policy compliance check                 │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│  PHASE 3: AUDIT                                         │
│                                                         │
│  repo-contract invariant check                          │
│  → Mechanic (deterministic, zero LLM)                   │
│  → Claude + Codex + Gemini (parallel, independent)      │
│  → holdout verification                                 │
│  → Synthesizer (verdict + confidence score)             │
│  → if MAJOR/CRITICAL: iterative fix loop ───────┐      │
│      engineer fixes → full re-review → repeat   │      │
│      best-of-N selection, up to 20 iterations   │      │
│      diff progression: full patch + delta ◄─────┘      │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│  PHASE 4: PACK                                          │
│                                                         │
│  all artifacts embedded → self-contained proofpack.json  │
└─────────────────────────────────────────────────────────┘

Provider Configuration (optional)

Create ~/.claude/emporium-providers.local.md to customize which models Codex and Gemini use:

---
version: 1
defaults:
  codex:
    model: "gpt-5.3-codex"
  gemini:
    model: "gemini-3.1-pro"
routing:
  review:
    gemini: "gemini-3-flash"
---

Without this file, signum uses each CLI's default model. See forge doctor to validate your config.

Requirements

Privacy

All orchestration runs inside Claude Code. External providers (Codex CLI, Gemini CLI) receive the diff only — never the full codebase. Signum degrades gracefully if either is unavailable. No API keys required beyond standard CLI auth. No telemetry. Artifacts stored in .signum/ (auto-added to .gitignore).

See also

License

MIT

About

Risk-adaptive development pipeline with adversarial consensus code review

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages