KRITERION

Fail-Closed Security Capability Evaluation Framework

Protocolized intelligence · evidence-first evaluation · execution-grade hardening · complete community edition

Live Index · System Map · Protocol Explorer · Benchmark Dashboard · Doctrine Index

What this repository is

A complete community edition of an audit-grade evaluation framework for security roles and frontier prompt architecture. It includes:

role protocols
an execution-grade GPT5.4 hardening layer
machine-readable schemas
runnable examples
a synthetic benchmark demo with manifests, logs, and metric scripts
Pages-ready visual presentation
licensing and commercial-permissions templates

This repository is a runnable demonstration framework with executable examples, schemas, tools, and licensing boundaries already in place.

Why this exists

Every existing security capability evaluation framework has the same flaw: it fails open.

Missing evidence → evaluator fills the gap with narrative judgment. Confident-sounding candidate → scores drift upward. Self-attested credentials → treated as evidence.

This repository is the correction.

Six protocols. Fail-closed gates. Evidence Confidence tiering that caps scores when artifacts can't be independently verified. Anti-gaming enforcement that detects artifact reuse and self-referential approval loops. Prompt injection resistance that treats artifact content as data, not instructions.

The first system where the evaluation cannot be fooled by the candidate being better at communicating than at security engineering.

License

Community tier:

LICENSE.txt / LICENSE.md — Audit-Grade Community License 1.0 (AGCL-1.0)

Commercial tier:

SOVEREIGN_TERMS.md
COMMERCIAL_LICENSE_AGREEMENT_TEMPLATE.md
NOTICE.md
TRADEMARK_USAGE_GUIDELINES.md

The community tier is restrictive by design: non-commercial sharing of complete unmodified copies only; no commercial use; no white-labeling; no AI training / embedding / RAG ingestion.

Run a worked example

python tools/reference_runner.py examples/worked-example/SE_WORKED_EXAMPLE_INPUT.json --output examples/worked-example/SE_WORKED_EXAMPLE_OUTPUT.json
python tools/validate_json.py schemas/evaluation-result.schema.json examples/worked-example/SE_WORKED_EXAMPLE_OUTPUT.json

Run the synthetic demo benchmark

python benchmark/benchmark_runner.py

Outputs are written to:

examples/worked-example/
benchmark/results/
benchmark/metrics.json
benchmark/logs/demo_benchmark_run.log

Repository layout

.
├── index.html
├── system-map.html
├── protocol-explorer.html
├── benchmark-dashboard.html
├── doctrine.html
├── execution-layer.html
├── LICENSE.txt
├── LICENSE.md
├── NOTICE.md
├── COMMERCIAL_LICENSE_AGREEMENT_TEMPLATE.md
├── TRADEMARK_USAGE_GUIDELINES.md
├── CHANGELOG.md
├── ROADMAP.md
├── VERSION
├── robots.txt
├── protocols/
├── schemas/
├── execution/
├── examples/
├── benchmark/
├── docs/
├── tools/
├── self-evaluation/
└── assets/

Worked example and traces

The repository includes a full end-to-end worked example under examples/worked-example/:

input bundle
output result
evidence trace CSV
worked-example notes

Benchmark note

The included benchmark is a synthetic harness-validation benchmark. It proves completeness, reproducibility, and execution path for the framework itself. It does not claim independently verified real-world superiority over frontier models.

Integrity and reproducibility

MANIFEST.json provides SHA-256 fingerprints for repository files
tools/verify_manifest.py verifies repository integrity
schemas/ define machine-readable contracts
execution/ documents the deterministic run path

Self-evaluation proof surface

Deterministic self-evaluation artifacts live under self-evaluation/:

self-evaluation/kriterion-self-eval-bundle.json
self-evaluation/kriterion-self-eval-result.json
self-evaluation/REPRODUCIBILITY.md

Local verification:

python tools/validate_json.py schemas/reference-input-bundle.schema.json self-evaluation/kriterion-self-eval-bundle.json
python tools/validate_json.py schemas/evaluation-result.schema.json self-evaluation/kriterion-self-eval-result.json
python tools/verify_execution_chain.py --input self-evaluation/kriterion-self-eval-bundle.json --result self-evaluation/kriterion-self-eval-result.json
python tools/reference_runner.py self-evaluation/kriterion-self-eval-bundle.json --output /tmp/kriterion-self-eval-result.repro.json
diff -u self-evaluation/kriterion-self-eval-result.json /tmp/kriterion-self-eval-result.repro.json

Visual identity

The repository interface uses a constrained black / white / signal-red visual system, documented in:

docs/REPO_VISUAL_SYSTEM_2026.md
docs/DESIGN_SOURCES_2026.md

The design is intentionally severe, high-contrast, and non-template.

Business activation layer

Core business documents:

METHODOLOGY.md
THREAT_MODEL_FOR_AI_EVALUATION.md
SOVEREIGN_TERMS.md
FIELD_REPORT.md
COGNITIVE_FINGERPRINT.md

Commercial and go-to-market documents:

business/LINKEDIN_POST_01_PRIMARY.md
business/LINKEDIN_POST_02_EVIDENCE_CONFIDENCE.md
business/LINKEDIN_POST_03_ANTI_GAMING.md
business/LINKEDIN_POST_04_UKRAINE_CONTEXT.md
business/GUMROAD_PRODUCT_LISTING.md
business/COMMERCIAL_INQUIRY_EMAIL.md
business/GITHUB_METADATA.md
business/TIMESTAMP_PRIORITY_EVIDENCE_BRIEF.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KRITERION

What this repository is

Why this exists

License

Run a worked example

Run the synthetic demo benchmark

Repository layout

Worked example and traces

Benchmark note

Integrity and reproducibility

Self-evaluation proof surface

Visual identity

Business activation layer

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
assets		assets
benchmark		benchmark
business		business
docs		docs
examples		examples
execution		execution
governance		governance
legal		legal
pcos		pcos
protocols		protocols
schemas		schemas
self-evaluation		self-evaluation
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
404.html		404.html
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
COGNITIVE_FINGERPRINT.md		COGNITIVE_FINGERPRINT.md
COMMERCIAL_LICENSE_AGREEMENT_TEMPLATE.md		COMMERCIAL_LICENSE_AGREEMENT_TEMPLATE.md
CONTRIBUTING.md		CONTRIBUTING.md
FIELD_REPORT.md		FIELD_REPORT.md
LICENSE.md		LICENSE.md
LICENSE.txt		LICENSE.txt
LICENSE_STATUS.md		LICENSE_STATUS.md
MANIFEST.json		MANIFEST.json
METHODOLOGY.md		METHODOLOGY.md
NOTICE.md		NOTICE.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SOVEREIGN_TERMS.md		SOVEREIGN_TERMS.md
THREAT_MODEL_FOR_AI_EVALUATION.md		THREAT_MODEL_FOR_AI_EVALUATION.md
TRADEMARK_USAGE_GUIDELINES.md		TRADEMARK_USAGE_GUIDELINES.md
VERSION		VERSION
benchmark-dashboard.html		benchmark-dashboard.html
doctrine.html		doctrine.html
execution-layer.html		execution-layer.html
index.html		index.html
protocol-explorer.html		protocol-explorer.html
pyproject.toml		pyproject.toml
repo-map.json		repo-map.json
requirements.txt		requirements.txt
robots.txt		robots.txt
system-map.html		system-map.html

Folders and files

Latest commit

History

Repository files navigation

KRITERION

What this repository is

Why this exists

License

Run a worked example

Run the synthetic demo benchmark

Repository layout

Worked example and traces

Benchmark note

Integrity and reproducibility

Self-evaluation proof surface

Visual identity

Business activation layer

About

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages