Janus

Named after the Roman god of gates and beginnings, whose two faces look simultaneously at the past and the future.

Stop shipping fragile plans. Janus reads your PRD, spec, or design doc and tells you which paths will break, which unknowns you're ignoring, and whether you should proceed.

janus eval my-spec.md
# => recommend | conditional | blocked

When to Use Janus

Use Janus before committing to a plan — not after.

You're about to merge a PRD that will drive weeks of implementation
You're picking between two architecture options and need a structured tiebreaker
You're shipping a spec that locks in an irreversible technical choice
You need a pre-commit gate that blocks fragile specs from entering the codebase
Your AI agent is about to finalize a design doc and needs a second opinion

When NOT to Use Janus

Routine code reviews (use your linter and code review tools)
Freeform brainstorming (too early for evaluation)
JSON schema validation (use ajv or zod)
Simple questions or clarifications

Quick Start

# One-line install
npm install -g janus-gate

# Or run without installing
npx janus-gate eval your-spec.md

# Or install via shell script
curl -fsSL https://raw.githubusercontent.com/devonestar/janus/main/install.sh | bash

janus eval your-spec.md

Janus returns one of three verdicts:

Verdict	Exit Code	Meaning
`recommend`	0	Safe to proceed
`conditional`	1	Proceed only after resolving named unknowns
`blocked`	2	Stop. Critical issues found

Every verdict includes: best_path (what to do), rejected_paths (what not to do and why), and critical_unknowns (what you don't know yet).

See It In Action

Below is a real verdict Janus produced on one of its own design documents — Round 10 of its dogfood series. The document was a roadmap that proposed five different ways to add named personas to Janus's evaluation pipeline. Janus was asked which one to build.

decision_status:     conditional
information_quality: degraded
best_path:           SUPPRESSED (below exact-match quorum)

rejected_paths:
  * Option P -- Parallel Specialists + Synthesizer     -> violates P1
    "The synthesizer is itself an LLM call that can invent content
     not present in any persona's output."
  * Option R -- Red-Team vs Blue-Team                   -> violates P1
    "The adjudicator inherits the same defect; also fails to scale
     past two voices."
  * Option S -- Principle-Sharded (P1-P7)               -> violates P3
    "7 LLM calls x 3 min = 21 min, breaks the cost-ceiling NFR.
     Principles are not orthogonal -- 7 critiques are partially redundant."
  * Option T -- System-Prompt Shape Only                -> violates P2
    "Ships as 'persona equipment' but delivers single-voice-with-pose.
     The framing itself is a P2 violation."

critical_unknowns:
  * U-1: The aggregator was validated for homogeneous variance.
         Personas inject heterogeneous prompts -- is 2/3 agreement
         consensus, or one persona having a different prior?
  * U-2: No fixture proves personas find rejections that --samples
         3 cannot. The hypothesis is untested.

variance_report:
  * 3/3 samples agreed on "conditional"
  * 3 samples produced three different phrasings of the same winner
  * Quorum rule refused to name a best_path from a renaming artifact

Four observations about what just happened:

Janus did not name a winner. Three samples agreed on what should win, but each sample phrased the name differently. Rather than pick one phrasing and pretend consensus, Janus suppressed the field. This is P7 -- Abstention Is Correct Behavior acting against Janus's own convenience.
Every rejection cites a principle. No path was rejected for being "bad" -- each failed a specific test (P1, P2, P3). An operator reading this can argue with the principle citation, not with Janus's taste.
The critical unknowns are honest. U-1 is a load-bearing doubt about whether Janus's own equipment (variance sampling) works on heterogeneous inputs. Janus raised it against the very round that wanted to ship that equipment.
The verdict is conditional, not recommend. Janus sees a winner but names preconditions before implementation may proceed. A recommend here would have been fragile; blocked would have been excessive. The middle status is the one that matches the evidence.

Janus evaluated its own roadmap, rejected four of five options against its own seven principles, refused to claim certainty, and named the conditions under which it would be willing to proceed. That is the tool, in one page.

The Seven Principles

Every Janus verdict is traceable to one of these. An agent reading Janus output should respect the cited principle before overriding it.

P1 -- No Certainty Without Evidence. No exact probabilities, only conditional language. Rejections say could violate, not will. If Janus has not seen evidence for a claim, the claim is not made.
P2 -- Unknowns Are First-Class. Missing information is surfaced as a named unknown, not silently filled in. If Janus has to assume something to reach a verdict, the assumption is listed where a reader can audit it.
P3 -- Robustness Over Optimism. Of two paths, the one that survives more failure modes wins. Ambition that has not been stress-tested loses to less ambitious paths that have been.
P4 -- Conflict Is a Rejection Signal. Contradicting constraints invalidate every path that depends on both. Janus reads the document looking for internal inconsistency and treats it as grounds for rejection, not a puzzle to solve.
P5 -- Scope Discipline. Janus evaluates only what the document says, not what the reader wishes it said. Opinions on adjacent topics are suppressed.
P6 -- Reversibility Preference. Paths that preserve the ability to change course beat paths that do not. Irreversible commitments under uncertainty are downgraded.
P7 -- Abstention Is Correct Behavior. blocked is a valid output, not a failure. Suppressed best_path is a valid output, not a gap. Refusing to answer is sometimes the most informative thing Janus can do.

CLI Usage

# Single-shot evaluation
janus eval my-prd.md

# Variance sampling -- run N times, return consensus
janus eval my-prd.md --samples 3

# Compare two options
janus compare option-a.md option-b.md

# Autonomous Generate-Evaluate-Eliminate-Refine loop
janus loop draft.md --max-iterations 3

# CI/CD binary gate (exit 0 = pass, non-zero = fail)
janus gate pr-spec.md

# Environment and backend health
janus doctor
janus doctor --probe
janus doctor --format json

Flags

--backend <name> -- claude (default), codex, opencode, openai-api, anthropic-api, mock
--model <id> -- override backend model. Omit to honor the backend's own config
--format <fmt> -- json, markdown, yaml (doctor: json or default human text)
--samples <N> (eval only, v0.2.0+) -- run N times, return consensus. N in [1, 5]. Linear runtime cost.
--max-iterations <N> (loop only)

What Janus Produces

Every janus eval or janus compare returns a JSON object with this shape:

{
  decision_status: "recommend" | "conditional" | "blocked",
  best_path: {
    name: string,
    rationale: string,
    enabling_conditions: string[],
    fragility_warnings: string[],
    robustness_score: "low" | "medium" | "high"
  } | null,
  rejected_paths: [{
    name: string,
    rejection_reason: string,
    violated_principle: "P1" | "P2" | ... | "P7" | null,
    could_recover: boolean,
    recovery_condition: string | null
  }],
  critical_unknowns: [{
    id: string,
    description: string,
    impact: string,
    question_for_human: string | null,
    source: "missing_field" | "inferred_assumption" | "information_asymmetry" | "external_dependency"
  }],
  assumptions: [...],
  information_quality: "sufficient" | "degraded" | "insufficient",
  next_actions: [{ priority: "critical"|"high"|"medium", action: string, addresses: string }],
  variance_report?: {              // present only when --samples > 1
    samples: number,
    decision_status_trace: DecisionStatus[],
    decision_status_agreement: number,
    best_path_agreement: number | null,
    tie_broken_to: DecisionStatus | null,
    rejected_path_frequency: Record<string, number>,
    critical_unknown_frequency: Record<string, number>,
    per_sample_errors: (string | null)[]
  }
}

Quick Facts

Field	Value
Current version	`0.5.0`
npm package	`janus-gate` (`npm i -g janus-gate` or `npx janus-gate`)
Binary	`janus` (on `$PATH` after install)
Default backend	`claude` (headless Claude Code CLI -- no API key)
Commands	`eval`, `compare`, `gate`, `loop`, `doom`, `harness`, `enrich`, `doctor`
Output formats	`json` (default off-TTY), `markdown` (default on TTY), `yaml`
Exit codes	`0=recommend`, `1=conditional`, `2=blocked`, `3=error`
Requires	Node 18+, TypeScript, one of: Claude Code CLI / Codex CLI / OpenAI API key / Anthropic API key
Source	`https://github.com/devonestar/janus`
Repo layout	`src/` (TS), `dist/` (built JS), `specs/` (internal design docs), `marketplace/` (plugin), `integrations/` (skill + hook + AGENTS)

Backends

Backend	Status	Notes
`claude` (default)	verified	Local Claude Code CLI, headless (`-p --output-format json`). Uses subscription, no API key.
`codex`	verified	Local Codex CLI. Honors `~/.codex/config.toml` default model unless `--model` given.
`opencode`	untested	Local OpenCode CLI.
`openai-api`	untested	Needs `OPENAI_API_KEY`.
`anthropic-api`	untested	Needs `ANTHROPIC_API_KEY`.
`mock`	verified	Rule-based, no LLM. For fast CI and structural checks.

Cost notes

On the default claude headless backend:

Short fixture (< 1 KB): ~40 s per eval
Typical PRD (5-10 KB): ~1-2 min per eval
Large roadmap (15+ KB): ~2-3 min per eval
--samples N multiplies wall-clock by N (serial). Default timeout is 240 s per call.

Use --backend mock when you want structure-only checks with no LLM cost.

Agent Integration

Janus is consumed by other agents via the plain CLI -- no MCP needed.

These are optional adapters on top of the CLI product.

Option 1 -- Claude Code plugin

claude plugin marketplace add ./marketplace
claude plugin install janus@janus-local

After a session restart, Claude Code auto-triggers the janus skill on PRD/spec work.

Option 2 -- Global CLAUDE.md snippet

Copy the JANUS_PROMPT_BLOCK from integrations/skill/SKILL.md into your ~/.claude/CLAUDE.md.

Option 3 -- Codex CLI / OpenCode / Git hook

Codex CLI: paste integrations/codex/project-instructions.md into your project's AGENTS.md
OpenCode: install integrations/skill/SKILL.md at ~/.config/opencode/skills/janus/SKILL.md
Git pre-commit gate: copy integrations/git-hook/pre-commit to .git/hooks/pre-commit

MCP server implementation is archived under reserved/mcp.ts and can be reinstated if needed.

Self-Dev Pipeline

Janus includes a built-in feature development loop for spec-driven work:

# Full pipeline: intake check -> eval -> loop -> validator
npm run self-dev -- my-spec.md claude

# Intake structure check only
npm run check:intake -- my-spec.md

# Variance probe: run --samples 3 N times, measure jitter
scripts/variance.sh 10 fixtures/smoke.md

The self-dev pipeline runs four stages in sequence:

Intake -- verifies the spec has required headings (Context/Problem, Goal, Constraints, Options, Unknowns, Decision requested)
Eval -- single-shot janus eval on the spec
Loop -- janus loop --max-iterations 3 for autonomous refinement
Validator -- auto-selected by filename pattern (canonical identity, candidate paths, or none)

Validators are registered in scripts/self-dev.mjs. To add a new one, add an entry to validatorRegistry with a filename regex and command array.

Dogfood Ledger

Janus has been self-evaluated fourteen times. Each round Janus evaluates one of its own design documents, applies its own principles, and the result drives the next change.

Round	Focus	Outcome
1	Initial self-eval	Found internal P4 contradiction in own spec
2	Remediation	First `recommend` + first `loop success`
3	Robustness	First `high` robustness verdict; inline-summary pattern proven
4	Rename Agamoto -> Janus	Verified functionally equivalent
5	Self-future decision	Janus picked its own next step, executed it
6	MCP integration	Built and verified, then archived per operator preference
7	Non-MCP integration	Skill + git-hook + Codex-AGENTS path shipped
8	Equipment roadmap	Janus chose variance-sampling (Option G) over 3 alternatives
9	Ship sampling	v0.2.0 with `--samples N`; discovered unknown-coverage is the larger value
10	Persona roadmap	Janus picked Q x Source-C; rejected 7 alternatives; named 3 preconditions for Round 11
11	README storytelling	Janus picked Option E (Concrete-Evidence-First); rejected 3 alternatives including metaphor-first
12	Doom Gate spec	Janus picked Option A (dedicated `doom` command + schema); rejected 3 alternatives (B=P4, C=P3, D=P4); conditional on grounding validation
13	Doom Gate ship	U-1 grounding experiment passed (5/5); `janus doom` implemented and verified on mock + claude backends
14	Doctor --probe spec	Janus picked Option A (--probe + exit codes + JSON in one cycle); loop terminated acceptable at iteration 3 (stagnant on U-1/U-2)

Roadmap

Current public surface:

eval, compare, loop, gate, doom
harness — 3-pass eval+targeted-doom+crosscheck pipeline
loop --harness — harness-aware refinement loop with LLM refiner
--samples N consensus sampling on eval
CLI-first integrations under integrations/
self-dev pipeline (npm run self-dev) for spec-driven feature development
archived MCP implementation under reserved/mcp.ts (not part of the shipped surface)

Work in progress / not yet shipped:

persona-gated PR / merge workflow
context enrichment (URL fetch + LLM researcher pass before harness)
failure future narratives (failure_chain field on rejected paths)

Known limitations (v0.4.0):

janus loop --harness refiner convergence on production specs is not yet empirically validated beyond internal dogfood
opencode subprocess stdout reliability when spawned from Node.js has shown intermittent truncation — harness results should be verified when --backend opencode is used in CI

Changelog

0.5.0 -- Feature: janus enrich <file> — fetches external evidence (GitHub API, npm registry, URLs) referenced in the document, then uses LLM to interpret findings against the document's assumptions. Surfaces what external data confirms, challenges, or leaves ungrounded. Addresses the document-bound evaluation gap for strategic/future-facing specs. New types: Claim, FetchedEvidence, EnrichmentFinding, EnrichmentReport. GITHUB_TOKEN env var supported for higher rate limits.
0.4.2 -- Feature: harness_verdict.verification_required — list of enabling conditions that survived doom (non-fatal) but need external validation. Surfaced in both JSON and markdown harness output. delta_from_eval string now includes verification=N count.
0.4.1 -- Fix: strip CLAUDE_* env vars (notably CLAUDE_AGENT_SDK_VERSION) before spawning the claude subprocess. When Janus is invoked from within Claude Code (agent sandbox), inherited SDK env vars put claude -p into protocol mode and hang the subprocess indefinitely instead of processing stdin. Affected: janus eval/doom/harness/loop via --backend claude when invoked under Claude Code or any other Claude SDK host.
0.4.0 -- Feature: janus harness 3-pass structured evaluation (eval → targeted-doom → crosscheck). janus loop --harness harness-aware refinement loop with LLM patch-mode refiner and convergence tracking. Primary backend: claude headless. Known limitations: loop refiner convergence not yet validated on production specs; opencode subprocess stdout intermittently unreliable in CI.
0.3.2 -- Fix: janus --version now reads version from package.json dynamically instead of a hardcoded string.
0.3.1 -- Fix: janus doctor no longer exits non-zero when zero LLM backends are installed. The command is diagnostic/informational; only Node < 18 (non-probe) or a failed --probe now causes non-zero exit. Unblocks CI runners with no LLM CLIs installed.
0.3.0 -- Feature: janus doom adversarial pre-mortem command. Accepts markdown files or inline text. Dedicated DoomReport schema with severity/survivability/survival_rating. Honest Futures Bar 6/6 achieved.
0.2.2 -- Fix: Claude backend --tools "" flag prevents recursive tool invocation. New structural test guard. Doom Gate spec added (Round 12 dogfood).
0.2.1 -- Fix: default claude backend timeout 120 s -> 240 s. Previously long docs (> 10 KB) could SIGTERM under samples.
0.2.0 -- Feature: --samples N on janus eval. New optional variance_report output field. Deterministic aggregation (majority vote + conservative tie-break + union of rejections/unknowns).
0.1.1 -- Internal: removed MCP from shipped surface (archived under reserved/). Default backend switched to claude headless.
0.1.0 -- Initial dogfood-verified CLI: eval / compare / loop / gate.

Reading Guide

If you are an agent picking up this README as context: extract the "When to Use" section, the Quick Start, the seven principles, the output schema, and the exit codes. Those let you invoke Janus correctly and interpret its output.
If you are a human first-reader: the "See It In Action" block is the shortest honest answer to "what does this tool actually do." Read that, then the principles, and you have the philosophy. Everything after is mechanics.
If you are evaluating adoption: decide whether the seven principles match your team's failure modes. If they do, the rest is integration detail. If they don't, no amount of README polish will fix that.

License

MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Janus

When to Use Janus

When NOT to Use Janus

Quick Start

See It In Action

The Seven Principles

CLI Usage

Flags

What Janus Produces

Quick Facts

Backends

Cost notes

Agent Integration

Option 1 -- Claude Code plugin

Option 2 -- Global CLAUDE.md snippet

Option 3 -- Codex CLI / OpenCode / Git hook

Self-Dev Pipeline

Dogfood Ledger

Roadmap

Changelog

Reading Guide

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
dist		dist
fixtures		fixtures
integrations		integrations
marketplace		marketplace
reserved		reserved
scripts		scripts
specs		specs
src		src
.gitignore		.gitignore
.janus-self-modifying-paths		.janus-self-modifying-paths
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
INTEGRATION-MCP-RESERVED.md		INTEGRATION-MCP-RESERVED.md
LICENSE		LICENSE
PUBLISHING.md		PUBLISHING.md
README.md		README.md
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Janus

When to Use Janus

When NOT to Use Janus

Quick Start

See It In Action

The Seven Principles

CLI Usage

Flags

What Janus Produces

Quick Facts

Backends

Cost notes

Agent Integration

Option 1 -- Claude Code plugin

Option 2 -- Global CLAUDE.md snippet

Option 3 -- Codex CLI / OpenCode / Git hook

Self-Dev Pipeline

Dogfood Ledger

Roadmap

Changelog

Reading Guide

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages