Named after the Roman god of gates and beginnings, whose two faces look simultaneously at the past and the future.
Stop shipping fragile plans. Janus reads your PRD, spec, or design doc and tells you which paths will break, which unknowns you're ignoring, and whether you should proceed.
janus eval my-spec.md
# => recommend | conditional | blockedUse Janus before committing to a plan — not after.
- You're about to merge a PRD that will drive weeks of implementation
- You're picking between two architecture options and need a structured tiebreaker
- You're shipping a spec that locks in an irreversible technical choice
- You need a pre-commit gate that blocks fragile specs from entering the codebase
- Your AI agent is about to finalize a design doc and needs a second opinion
- Routine code reviews (use your linter and code review tools)
- Freeform brainstorming (too early for evaluation)
- JSON schema validation (use ajv or zod)
- Simple questions or clarifications
# One-line install
npm install -g janus-gate
# Or run without installing
npx janus-gate eval your-spec.md
# Or install via shell script
curl -fsSL https://raw.githubusercontent.com/devonestar/janus/main/install.sh | bashjanus eval your-spec.mdJanus returns one of three verdicts:
| Verdict | Exit Code | Meaning |
|---|---|---|
recommend |
0 | Safe to proceed |
conditional |
1 | Proceed only after resolving named unknowns |
blocked |
2 | Stop. Critical issues found |
Every verdict includes: best_path (what to do), rejected_paths (what not to do and why), and critical_unknowns (what you don't know yet).
Below is a real verdict Janus produced on one of its own design documents — Round 10 of its dogfood series. The document was a roadmap that proposed five different ways to add named personas to Janus's evaluation pipeline. Janus was asked which one to build.
decision_status: conditional
information_quality: degraded
best_path: SUPPRESSED (below exact-match quorum)
rejected_paths:
* Option P -- Parallel Specialists + Synthesizer -> violates P1
"The synthesizer is itself an LLM call that can invent content
not present in any persona's output."
* Option R -- Red-Team vs Blue-Team -> violates P1
"The adjudicator inherits the same defect; also fails to scale
past two voices."
* Option S -- Principle-Sharded (P1-P7) -> violates P3
"7 LLM calls x 3 min = 21 min, breaks the cost-ceiling NFR.
Principles are not orthogonal -- 7 critiques are partially redundant."
* Option T -- System-Prompt Shape Only -> violates P2
"Ships as 'persona equipment' but delivers single-voice-with-pose.
The framing itself is a P2 violation."
critical_unknowns:
* U-1: The aggregator was validated for homogeneous variance.
Personas inject heterogeneous prompts -- is 2/3 agreement
consensus, or one persona having a different prior?
* U-2: No fixture proves personas find rejections that --samples
3 cannot. The hypothesis is untested.
variance_report:
* 3/3 samples agreed on "conditional"
* 3 samples produced three different phrasings of the same winner
* Quorum rule refused to name a best_path from a renaming artifact
Four observations about what just happened:
- Janus did not name a winner. Three samples agreed on what should win, but each sample phrased the name differently. Rather than pick one phrasing and pretend consensus, Janus suppressed the field. This is P7 -- Abstention Is Correct Behavior acting against Janus's own convenience.
- Every rejection cites a principle. No path was rejected for being "bad" -- each failed a specific test (P1, P2, P3). An operator reading this can argue with the principle citation, not with Janus's taste.
- The critical unknowns are honest. U-1 is a load-bearing doubt about whether Janus's own equipment (variance sampling) works on heterogeneous inputs. Janus raised it against the very round that wanted to ship that equipment.
- The verdict is
conditional, notrecommend. Janus sees a winner but names preconditions before implementation may proceed. Arecommendhere would have been fragile;blockedwould have been excessive. The middle status is the one that matches the evidence.
Janus evaluated its own roadmap, rejected four of five options against its own seven principles, refused to claim certainty, and named the conditions under which it would be willing to proceed. That is the tool, in one page.
Every Janus verdict is traceable to one of these. An agent reading Janus output should respect the cited principle before overriding it.
- P1 -- No Certainty Without Evidence. No exact probabilities, only conditional language. Rejections say could violate, not will. If Janus has not seen evidence for a claim, the claim is not made.
- P2 -- Unknowns Are First-Class. Missing information is surfaced as a named unknown, not silently filled in. If Janus has to assume something to reach a verdict, the assumption is listed where a reader can audit it.
- P3 -- Robustness Over Optimism. Of two paths, the one that survives more failure modes wins. Ambition that has not been stress-tested loses to less ambitious paths that have been.
- P4 -- Conflict Is a Rejection Signal. Contradicting constraints invalidate every path that depends on both. Janus reads the document looking for internal inconsistency and treats it as grounds for rejection, not a puzzle to solve.
- P5 -- Scope Discipline. Janus evaluates only what the document says, not what the reader wishes it said. Opinions on adjacent topics are suppressed.
- P6 -- Reversibility Preference. Paths that preserve the ability to change course beat paths that do not. Irreversible commitments under uncertainty are downgraded.
- P7 -- Abstention Is Correct Behavior.
blockedis a valid output, not a failure. Suppressedbest_pathis a valid output, not a gap. Refusing to answer is sometimes the most informative thing Janus can do.
# Single-shot evaluation
janus eval my-prd.md
# Variance sampling -- run N times, return consensus
janus eval my-prd.md --samples 3
# Compare two options
janus compare option-a.md option-b.md
# Autonomous Generate-Evaluate-Eliminate-Refine loop
janus loop draft.md --max-iterations 3
# CI/CD binary gate (exit 0 = pass, non-zero = fail)
janus gate pr-spec.md
# Environment and backend health
janus doctor
janus doctor --probe
janus doctor --format json--backend <name>--claude(default),codex,opencode,openai-api,anthropic-api,mock--model <id>-- override backend model. Omit to honor the backend's own config--format <fmt>--json,markdown,yaml(doctor:jsonor default human text)--samples <N>(eval only, v0.2.0+) -- run N times, return consensus. N in [1, 5]. Linear runtime cost.--max-iterations <N>(loop only)
Every janus eval or janus compare returns a JSON object with this shape:
{
decision_status: "recommend" | "conditional" | "blocked",
best_path: {
name: string,
rationale: string,
enabling_conditions: string[],
fragility_warnings: string[],
robustness_score: "low" | "medium" | "high"
} | null,
rejected_paths: [{
name: string,
rejection_reason: string,
violated_principle: "P1" | "P2" | ... | "P7" | null,
could_recover: boolean,
recovery_condition: string | null
}],
critical_unknowns: [{
id: string,
description: string,
impact: string,
question_for_human: string | null,
source: "missing_field" | "inferred_assumption" | "information_asymmetry" | "external_dependency"
}],
assumptions: [...],
information_quality: "sufficient" | "degraded" | "insufficient",
next_actions: [{ priority: "critical"|"high"|"medium", action: string, addresses: string }],
variance_report?: { // present only when --samples > 1
samples: number,
decision_status_trace: DecisionStatus[],
decision_status_agreement: number,
best_path_agreement: number | null,
tie_broken_to: DecisionStatus | null,
rejected_path_frequency: Record<string, number>,
critical_unknown_frequency: Record<string, number>,
per_sample_errors: (string | null)[]
}
}| Field | Value |
|---|---|
| Current version | 0.5.0 |
| npm package | janus-gate (npm i -g janus-gate or npx janus-gate) |
| Binary | janus (on $PATH after install) |
| Default backend | claude (headless Claude Code CLI -- no API key) |
| Commands | eval, compare, gate, loop, doom, harness, enrich, doctor |
| Output formats | json (default off-TTY), markdown (default on TTY), yaml |
| Exit codes | 0=recommend, 1=conditional, 2=blocked, 3=error |
| Requires | Node 18+, TypeScript, one of: Claude Code CLI / Codex CLI / OpenAI API key / Anthropic API key |
| Source | https://github.com/devonestar/janus |
| Repo layout | src/ (TS), dist/ (built JS), specs/ (internal design docs), marketplace/ (plugin), integrations/ (skill + hook + AGENTS) |
| Backend | Status | Notes |
|---|---|---|
claude (default) |
verified | Local Claude Code CLI, headless (-p --output-format json). Uses subscription, no API key. |
codex |
verified | Local Codex CLI. Honors ~/.codex/config.toml default model unless --model given. |
opencode |
untested | Local OpenCode CLI. |
openai-api |
untested | Needs OPENAI_API_KEY. |
anthropic-api |
untested | Needs ANTHROPIC_API_KEY. |
mock |
verified | Rule-based, no LLM. For fast CI and structural checks. |
On the default claude headless backend:
- Short fixture (< 1 KB): ~40 s per eval
- Typical PRD (5-10 KB): ~1-2 min per eval
- Large roadmap (15+ KB): ~2-3 min per eval
--samples Nmultiplies wall-clock by N (serial). Default timeout is 240 s per call.
Use --backend mock when you want structure-only checks with no LLM cost.
Janus is consumed by other agents via the plain CLI -- no MCP needed.
These are optional adapters on top of the CLI product.
claude plugin marketplace add ./marketplace
claude plugin install janus@janus-localAfter a session restart, Claude Code auto-triggers the janus skill on PRD/spec work.
Copy the JANUS_PROMPT_BLOCK from integrations/skill/SKILL.md into your ~/.claude/CLAUDE.md.
- Codex CLI: paste
integrations/codex/project-instructions.mdinto your project'sAGENTS.md - OpenCode: install
integrations/skill/SKILL.mdat~/.config/opencode/skills/janus/SKILL.md - Git pre-commit gate: copy
integrations/git-hook/pre-committo.git/hooks/pre-commit
MCP server implementation is archived under reserved/mcp.ts and can be reinstated if needed.
Janus includes a built-in feature development loop for spec-driven work:
# Full pipeline: intake check -> eval -> loop -> validator
npm run self-dev -- my-spec.md claude
# Intake structure check only
npm run check:intake -- my-spec.md
# Variance probe: run --samples 3 N times, measure jitter
scripts/variance.sh 10 fixtures/smoke.mdThe self-dev pipeline runs four stages in sequence:
- Intake -- verifies the spec has required headings (Context/Problem, Goal, Constraints, Options, Unknowns, Decision requested)
- Eval -- single-shot
janus evalon the spec - Loop --
janus loop --max-iterations 3for autonomous refinement - Validator -- auto-selected by filename pattern (canonical identity, candidate paths, or none)
Validators are registered in scripts/self-dev.mjs. To add a new one, add an entry to validatorRegistry with a filename regex and command array.
Janus has been self-evaluated fourteen times. Each round Janus evaluates one of its own design documents, applies its own principles, and the result drives the next change.
| Round | Focus | Outcome |
|---|---|---|
| 1 | Initial self-eval | Found internal P4 contradiction in own spec |
| 2 | Remediation | First recommend + first loop success |
| 3 | Robustness | First high robustness verdict; inline-summary pattern proven |
| 4 | Rename Agamoto -> Janus | Verified functionally equivalent |
| 5 | Self-future decision | Janus picked its own next step, executed it |
| 6 | MCP integration | Built and verified, then archived per operator preference |
| 7 | Non-MCP integration | Skill + git-hook + Codex-AGENTS path shipped |
| 8 | Equipment roadmap | Janus chose variance-sampling (Option G) over 3 alternatives |
| 9 | Ship sampling | v0.2.0 with --samples N; discovered unknown-coverage is the larger value |
| 10 | Persona roadmap | Janus picked Q x Source-C; rejected 7 alternatives; named 3 preconditions for Round 11 |
| 11 | README storytelling | Janus picked Option E (Concrete-Evidence-First); rejected 3 alternatives including metaphor-first |
| 12 | Doom Gate spec | Janus picked Option A (dedicated doom command + schema); rejected 3 alternatives (B=P4, C=P3, D=P4); conditional on grounding validation |
| 13 | Doom Gate ship | U-1 grounding experiment passed (5/5); janus doom implemented and verified on mock + claude backends |
| 14 | Doctor --probe spec | Janus picked Option A (--probe + exit codes + JSON in one cycle); loop terminated acceptable at iteration 3 (stagnant on U-1/U-2) |
Current public surface:
eval,compare,loop,gate,doomharness— 3-pass eval+targeted-doom+crosscheck pipelineloop --harness— harness-aware refinement loop with LLM refiner--samples Nconsensus sampling oneval- CLI-first integrations under
integrations/ - self-dev pipeline (
npm run self-dev) for spec-driven feature development - archived MCP implementation under
reserved/mcp.ts(not part of the shipped surface)
Work in progress / not yet shipped:
- persona-gated PR / merge workflow
- context enrichment (URL fetch + LLM researcher pass before harness)
- failure future narratives (
failure_chainfield on rejected paths)
Known limitations (v0.4.0):
janus loop --harnessrefiner convergence on production specs is not yet empirically validated beyond internal dogfood- opencode subprocess stdout reliability when spawned from Node.js has shown intermittent truncation — harness results should be verified when
--backend opencodeis used in CI
- 0.5.0 -- Feature:
janus enrich <file>— fetches external evidence (GitHub API, npm registry, URLs) referenced in the document, then uses LLM to interpret findings against the document's assumptions. Surfaces what external data confirms, challenges, or leaves ungrounded. Addresses the document-bound evaluation gap for strategic/future-facing specs. New types:Claim,FetchedEvidence,EnrichmentFinding,EnrichmentReport.GITHUB_TOKENenv var supported for higher rate limits. - 0.4.2 -- Feature:
harness_verdict.verification_required— list of enabling conditions that survived doom (non-fatal) but need external validation. Surfaced in both JSON and markdown harness output.delta_from_evalstring now includesverification=Ncount. - 0.4.1 -- Fix: strip
CLAUDE_*env vars (notablyCLAUDE_AGENT_SDK_VERSION) before spawning theclaudesubprocess. When Janus is invoked from within Claude Code (agent sandbox), inherited SDK env vars putclaude -pinto protocol mode and hang the subprocess indefinitely instead of processing stdin. Affected:janus eval/doom/harness/loopvia--backend claudewhen invoked under Claude Code or any other Claude SDK host. - 0.4.0 -- Feature:
janus harness3-pass structured evaluation (eval → targeted-doom → crosscheck).janus loop --harnessharness-aware refinement loop with LLM patch-mode refiner and convergence tracking. Primary backend:claudeheadless. Known limitations: loop refiner convergence not yet validated on production specs; opencode subprocess stdout intermittently unreliable in CI. - 0.3.2 -- Fix:
janus --versionnow reads version frompackage.jsondynamically instead of a hardcoded string. - 0.3.1 -- Fix:
janus doctorno longer exits non-zero when zero LLM backends are installed. The command is diagnostic/informational; only Node < 18 (non-probe) or a failed--probenow causes non-zero exit. Unblocks CI runners with no LLM CLIs installed. - 0.3.0 -- Feature:
janus doomadversarial pre-mortem command. Accepts markdown files or inline text. Dedicated DoomReport schema with severity/survivability/survival_rating. Honest Futures Bar 6/6 achieved. - 0.2.2 -- Fix: Claude backend
--tools ""flag prevents recursive tool invocation. New structural test guard. Doom Gate spec added (Round 12 dogfood). - 0.2.1 -- Fix: default
claudebackend timeout 120 s -> 240 s. Previously long docs (> 10 KB) could SIGTERM under samples. - 0.2.0 -- Feature:
--samples Nonjanus eval. New optionalvariance_reportoutput field. Deterministic aggregation (majority vote + conservative tie-break + union of rejections/unknowns). - 0.1.1 -- Internal: removed MCP from shipped surface (archived under
reserved/). Default backend switched toclaudeheadless. - 0.1.0 -- Initial dogfood-verified CLI:
eval/compare/loop/gate.
- If you are an agent picking up this README as context: extract the "When to Use" section, the Quick Start, the seven principles, the output schema, and the exit codes. Those let you invoke Janus correctly and interpret its output.
- If you are a human first-reader: the "See It In Action" block is the shortest honest answer to "what does this tool actually do." Read that, then the principles, and you have the philosophy. Everything after is mechanics.
- If you are evaluating adoption: decide whether the seven principles match your team's failure modes. If they do, the rest is integration detail. If they don't, no amount of README polish will fix that.
MIT.