Build evidence for what your AI did.
Accountable execution for AI systems. Assay creates signed evidence for AI workflows that a reviewer can verify offline. It proves what can be responsibly claimed about the artifact, not the truth of every upstream component.
Execution can succeed while proof fails. When evidence is missing, the system refuses to overclaim.
| Exit | State | Meaning |
|---|---|---|
0 |
pass | Authentic evidence, standards met |
1 |
honest fail | Authentic evidence, standards not met |
2 |
tampered | Evidence altered after signing |
A signed failure is stronger evidence than a vague pass.
pip install assay-ai
assay tryBuilds a proof pack, signs it, tampers one byte, catches the break. No API key. No account. 15 seconds.
We scanned 30 AI projects with 202 LLM call sites. Zero had tamper-evident evidence trails. Full results.
Next: assay start to instrument your code, or the reviewer packet flow when your job is producing something another team can verify.
Boundary: Assay proves the evidence artifact has not been quietly changed after the fact. It does not, by itself, prove every upstream component was honest. See trust tiers. Assay is not a logging framework. It produces signed evidence bundles that a third party can verify offline.
Install details (Windows, PATH issues, deterministic setup)
# Windows
py -m pip install assay-aiAssay requires Python 3.9+.
If pip is not on your PATH, use python3 -m pip on macOS/Linux or
py -m pip on Windows.
Validation status:
- CI smoke-tests the first CLI path on Linux, macOS, and Windows using
assay versionandassay try. - The deeper SDK compatibility suite currently runs on Ubuntu.
If assay is not recognized after install, open a new terminal first. On
Windows, the usual fix is adding Python's Scripts directory to PATH.
For deterministic environment setup, see docs/START_HERE.md.
Shell completions (bash/zsh/fish/PowerShell):
assay --install-completionRestart your shell after installing. Tab completion works for all commands and options.
assay try (above) gives you the 15-second version. For the full specimen
with file output and manual verification, use the challenge demo:
assay demo-challenge # creates challenge_pack/ with good + tampered packsTwo packs, one byte changed ("gpt-4" -> "gpt-5" in the receipts). Here's what happens (pack IDs and timestamps will differ on your machine):
$ assay verify-pack challenge_pack/good/
VERIFICATION PASSED
Pack ID: pack_20260222_ca2bb665
Integrity: PASS
Claims: PASS
Receipts: 3
Signature: Ed25519 valid
Exit code: 0
$ assay verify-pack challenge_pack/tampered/
VERIFICATION FAILED
Pack ID: pack_20260222_ca2bb665
Integrity: FAIL
Error: Hash mismatch for receipt_pack.jsonl
Exit code: 2
One byte changed. Verification fails. No server access needed. No trust required. Just math.
Now try the policy violation demo:
assay demo-incident # two-act scenario: honest PASS vs honest FAIL Act 1: Agent uses gpt-4 with guardian check
Integrity: PASS Claims: PASS Exit code: 0
Act 2: Someone swaps model to gpt-3.5-turbo, removes guardian
Integrity: PASS Claims: FAIL Exit code: 1
Act 2 is an honest failure -- authentic evidence proving the run violated its declared standards. The evidence is real. The failure is real. Nobody can edit the history. Exit code 1.
Honest failure is a feature, not an embarrassment. Exit 1 is audit gold: a control failed, the failure is detectable and retained, and the evidence is authentic. A signed failure is stronger evidence than a vague pass. Auditors, regulators, and buyers trust systems that can show what went wrong -- not systems that only ever claim success.
Assay separates two questions on purpose:
- Integrity: "Were these bytes tampered with after creation?" (signatures, hashes, required files)
- Claims: "Does this evidence satisfy our declared governance checks?" (receipt types, counts, field values)
| Integrity | Claims | Exit | Meaning |
|---|---|---|---|
| PASS | PASS | 0 | Evidence checks out, declared standards pass |
| PASS | FAIL | 1 | Honest failure: authentic evidence of a standards violation |
| FAIL | -- | 2 | Tampered evidence |
| -- | -- | 3 | Bad input (missing files, invalid arguments) |
The split is the point. Systems that can prove they failed honestly are more trustworthy than systems that always claim to pass.
With real calls: assay scan . finds your actual OpenAI / Anthropic / Gemini / LiteLLM / LangChain call sites. assay patch . instruments them. Every real LLM call emits a signed receipt. The demos above use synthetic data so you can see verification without configuring anything.
Installing Assay gives you the CLI, receipt store, and proof-pack builder. It does not automatically record your app.
Receipts are emitted only when your runtime is instrumented:
assay patch .inserts the right Assay integration for supported SDKspatch()wrappers emit receipts when model calls happenAssayCallbackHandler()does the same for LangChain callback flowsemit_receipt(...)lets you record events manually in any stack
assay run -- <your command> then does three things:
- creates a trace id
- runs your app with
ASSAY_TRACE_IDin the environment - packages any emitted receipts into
proof_pack_<trace_id>/
The result is a signed, offline-verifiable artifact:
app execution
-> instrumented SDK or emit_receipt(...)
-> receipts written to ~/.assay/...
-> assay run packages them into proof_pack_<trace_id>/
-> assay verify-pack checks the artifact offline
# 1. Find uninstrumented LLM calls
assay scan . --report
# 2. Patch (one line per SDK, or auto-patch all)
assay patch .
# 3. Run + build a signed evidence pack
# -c receipt_completeness runs the built-in completeness check (see `assay cards list` for all options)
# everything after -- is your normal run command
assay run -c receipt_completeness -- python my_app.py
# 4. Verify
assay verify-pack ./proof_pack_*/
# 5. Generate report artifacts for security/compliance review
assay report . -o evidence_report.html --sarif
# 6. Optional: set and enforce score gates in CI
assay gate save-baseline
assay gate check . --min-score 60 --fail-on-regressionassay scan . --report finds every LLM call site (OpenAI, Anthropic, Google
Gemini, LiteLLM, LangChain) and generates a self-contained HTML gap report.
assay patch inserts the two-line integration. assay run wraps your command,
collects receipts, and produces a signed 5-file evidence pack. assay verify-pack
checks integrity + claims and exits with one of the four codes above. Then run
assay explain on any pack for a plain-English summary.
Local models: Any OpenAI-compatible server (Ollama, LM Studio, vLLM,
llama.cpp) works automatically -- Assay patches the OpenAI SDK at the class
level, so OpenAI(base_url="http://localhost:11434/v1") emits receipts like
any other provider. LiteLLM users get the same coverage via the LiteLLM
integration (ollama/llama3, etc.).
Why now: EU AI Act Article 12 requires automatic logging for high-risk AI systems; Article 19 requires providers to retain automatically generated logs for at least 6 months. High-risk obligations apply from 2 Aug 2026 (Annex III) and 2 Aug 2027 (regulated products). SOC 2 CC7.2 requires monitoring of system components and analysis of anomalies as security events. "We have logs on our server" is not independently verifiable evidence. Assay produces evidence that is. See compliance citations for exact references.
Fastest path (recommended):
assay ci init github --run-command "python my_app.py" --min-score 60This generates a 3-job GitHub Actions workflow:
assay-gate(score enforcement, regression checks, JSON gate report artifact)assay-verify(proof pack generation + cryptographic verification)assay-report(HTML evidence report artifact + SARIF upload)
Manual path (advanced):
assay gate save-baseline
assay gate check . --min-score 60 --fail-on-regression --save-report assay_gate_report.json --verbose --json
assay run -c receipt_completeness -- python my_app.py
assay verify-pack ./proof_pack_*/ --lock assay.lock --require-claim-pass
assay report . -o evidence_report.html --sarifThe lockfile catches config drift. Verify-pack catches tampering. Gate
enforces score regressions. Report produces the shareable artifact + SARIF.
assay diff remains useful for deep forensics and budget/drift analysis. See
Decision Escrow for the protocol model.
# Lock your verification contract
assay lock write --cards receipt_completeness -o assay.lockRegression forensics:
assay diff ./proof_pack_*/ --against-previous --why--against-previous auto-discovers the baseline pack.
--why traces receipt chains to explain what regressed and which call sites caused it.
Cost/latency drift (from receipts):
assay analyze --history --since 7Shows cost, latency percentiles, error rates, and per-model breakdowns from your local trace history.
Enterprise customers ask AI governance questions in security questionnaires. VendorQ compiles evidence-backed answer packets from Assay proof packs. Every answer traces to a signed receipt. Every modification is detectable.
For the buyer-facing wrapper around that proof material, see docs/reviewer-packets.md.
Quick path:
assay vendorq ingest --in questionnaire.csv --out .assay/vendorq/questions.json
assay vendorq compile --questions .assay/vendorq/questions.json --pack ./proof_pack_* --policy conservative --out .assay/vendorq/answers.json
assay vendorq export-reviewer --proof-pack ./proof_pack_* --out reviewer_packet
assay reviewer verify reviewer_packetUse VendorQ when the pain is: "we have to answer AI-governance questions and we cannot hand the reviewer a verifiable artifact."
# Ingest a questionnaire, compile answers against evidence, lock, verify
assay vendorq ingest --in questionnaire.csv --out questions.json
assay vendorq compile --questions questions.json --pack ./proof_pack --out answers.json
assay vendorq lock write --answers answers.json --pack ./proof_pack --out vendorq.lock
assay vendorq verify --answers answers.json --pack ./proof_pack --lock vendorq.lock --strict10 deterministic verification rules. Tamper one answer and verification fails with exit code 2. The packet is forwardable to your customer's security team — they verify it offline with a public key.
See it live: Proof Gallery — three real proof packs demonstrating pass, honest fail, and tamper detection. All three are independently verifiable without any account or API key.
Adversarial testing: 16 attack scenarios, 16 catches, 0 false passes.
A reviewer-ready evidence packet is the buyer-facing wrapper around a signed proof pack. Assay produces the proof pack. The evidence packet makes that proof usable across an organizational boundary: scope, coverage, review state, and the nested proof-pack verification path in one forwardable artifact.
# Compile a reviewer packet from a proof pack plus declarative packet inputs
assay vendorq export-reviewer \
--proof-pack tests/fixtures/reviewer_packet/sample_proof_pack \
--boundary tests/fixtures/reviewer_packet/sample_boundary.json \
--mapping tests/fixtures/reviewer_packet/sample_mapping.json \
--out reviewer_packet_demo
# Verify the reviewer packet and derive the settlement
assay reviewer verify reviewer_packet_demo
assay reviewer verify reviewer_packet_demo --jsonCanonical handoff flow:
proof pack -> reviewer packet -> assay reviewer verify -> browser verify
Buyer verdicts and CLI exit codes are different layers:
- Buyer verdicts: VERIFIED, VERIFIED_WITH_GAPS, INCOMPLETE_EVIDENCE, EVIDENCE_REGRESSION, TAMPERED, OUT_OF_SCOPE
- CLI exit codes: 0/1/2/3 for PASS, HONEST_FAIL, TAMPERED, and bad input
Use the proof pack when you need cryptographic verification. Use the evidence packet when another team needs a bounded artifact they can inspect, forward, and challenge.
Verify online: Browser verifier — drop in a proof pack or reviewer packet and check it client-side.
A passport is a signed, content-addressed JSON object that summarizes what was verified about an AI system: claims, coverage, reliance class, and a validity window. Built from proof pack evidence, not asserted by hand.
Try the seeded lifecycle demo (no API key, no repo context needed):
pip install assay-ai
assay passport demoThe demo intentionally starts with a weak passport, then challenges and supersedes it. The initial X-Ray grade (D) is part of the lifecycle, not a product failure.
12 commands (assay passport --help). The 6 you'll use most:
| Command | Question |
|---|---|
verify |
Is this artifact authentic and untampered? |
status |
Should I rely on it under my policy? (PASS/WARN/FAIL) |
xray |
How strong is the evidence posture? (A-F grade) |
challenge |
Record a governance objection against a passport |
supersede |
Link the old passport to an improved successor |
diff |
What changed between two passport versions? |
Also: mint, sign, show, render, revoke, demo.
Full command set:
# Mint a passport from a proof pack, sign it, verify it
assay passport mint --pack ./proof_pack/ --subject-name "MyApp" \
--system-id "my.app.v1" --owner "My Org" --output passport.json
assay passport sign passport.json
assay passport verify passport.json
# Check reliance posture under a policy mode
assay passport status passport.json --mode buyer-safe --json
# X-Ray diagnostic: structural grade (A-F) and improvement path
assay passport xray passport.json --report xray.html
# Lifecycle governance (all cryptographically signed)
assay passport challenge passport.json --reason "Missing coverage"
assay passport supersede old.json new.json --reason "Addressed gap"
assay passport diff old.json new.json --report diff.htmlWorked example: Seeded referee gallery —
pre-built signed passports, governance receipts, X-Ray diagnostic, and
trust diff. All artifacts are regenerable via
python3 docs/passport/generate_gallery.py.
Deeper docs: Passport guide | Verification ritual | Gallery manifest
What this proves today:
- Signed, content-addressed passport artifacts with Ed25519 signatures
- Deterministic lifecycle governance: challenge, supersede, revoke, diff
- Reproducible worked examples on seeded reference artifacts
- Offline verification without network access
What is future scope:
- Arbitrary external trust-surface scanning (URLs, PDFs, vendor pages)
- Minting from external vendor documents (currently proof-pack only)
- Generalized trust analysis across messy real-world inputs
- Enterprise diff workflows (primitive exists, product does not)
ADC is a structured schema for packaging AI decision evidence into verifiable, time-bounded credentials. An ADC wraps the proof pack with decision metadata: what was decided, by whom, under what policy, with what evidence, and how long the credential remains valid.
# Verify a pack with expiry enforcement
assay verify-pack ./proof_pack_*/ --check-expiry
# ADC v0.1 schema: 35 properties, 17 required, additionalProperties: false
# Schema: src/assay/schemas/adc_v0.1.schema.jsonThe conformance corpus includes 10 canonical packs (including stale_01
for expired credentials and superseded_01 for replaced decisions).
Assay is not a truth oracle. It is an evidence-hardening layer.
| If someone tries to... | Without Assay | With Assay |
|---|---|---|
| Edit evidence after a run | Hard to notice | Verification fails |
| Drop or weaken locked checks | Easy to hide | Lock mismatch exposes it |
| Omit covered call sites | Easy to hand-wave | Completeness checks catch it |
| Hand buyer internal logs, ask for trust | Buyer must trust the operator | Buyer verifies offline |
| Fabricate a complete run from scratch | Possible | Still possible at base tier; stronger deployment raises the cost |
Why there is no quiet edit. Every file in a proof pack is fingerprinted. The fingerprints are recorded in a manifest. The manifest is digitally signed. Change a file -- the fingerprint won't match. Fix the manifest to cover it -- the signature breaks. Re-sign the manifest -- the signer identity changes. Every path to tampering leaves a visible trace.
Assay proves the evidence artifact has not been quietly changed after the fact. It does not, by itself, prove every upstream component was honest.
Deployment ladder -- start at Base, strengthen as your trust requirements grow:
- Base -- self-signed artifact, offline-verifiable, tamper-evident
- Hardened -- CI-held signing key + branch protection (separates signer from developer)
- Anchored -- transparency ledger + external timestamping (RFC 3161)
Completeness is enforced relative to call sites enumerated by the scanner and/or declared by policy. Undetected call sites are a known residual risk, reduced via multi-detector scanning and CI gating.
Assay doesn't make fraud impossible -- it makes fraud expensive, fragile, and much easier to catch.
Assay is an evidence compiler for AI execution. If you've used a build system, you already know the mental model:
| Concept | Build System | Assay |
|---|---|---|
| Source | .c / .ts files |
Receipts (one per LLM call) |
| Artifact | Binary / bundle | Evidence pack (5 files, 1 signature) |
| Tests | Unit / integration tests | Verification (integrity + claims) |
| Lock | package-lock.json |
assay.lock |
| Gate | CI deploy check | CI evidence gate |
The core path is 6 commands:
assay try # 60-second demo (sign, tamper, catch)
assay scan / assay patch # instrument
assay run # produce evidence
assay verify-pack # verify evidence
assay diff # catch regressions
assay score # evidence readiness (0-100, A-F)
Full command reference:
Getting started
| Command | Purpose |
|---|---|
assay try |
60-second demo: sign, tamper, catch |
assay status |
One-screen operational dashboard |
assay start demo|ci|mcp |
Guided entrypoints for trying, CI setup, or MCP auditing |
assay onboard |
Guided setup: doctor -> scan -> first run plan |
assay doctor |
Preflight check: is Assay ready here? |
assay version |
Print installed version |
Instrument + produce evidence
| Command | Purpose |
|---|---|
assay scan |
Find uninstrumented LLM call sites (--report for HTML) |
assay patch |
Auto-insert SDK integration patches into your entrypoint |
assay run |
Wrap command, collect receipts, build signed evidence pack |
Verify + analyze
| Command | Purpose |
|---|---|
assay verify-pack |
Verify integrity + claims (the 4 exit codes) |
assay verify-signer |
Extract and verify signer identity from a pack manifest |
assay explain |
Plain-English summary of an evidence pack |
assay analyze |
Cost, latency, error breakdown from pack or --history |
assay diff |
Compare packs: claims, cost, latency (--against-previous, --why, --gate-*) |
assay score |
Evidence Readiness Score (0-100, A-F) with anti-gaming caps |
Workflows + CI
| Command | Purpose |
|---|---|
assay flow try|adopt|ci|mcp|audit |
Guided workflow executor (dry-run by default, --apply to execute) |
assay ci init github |
Generate a GitHub Actions workflow |
assay ci doctor |
CI-profile preflight checks |
assay audit bundle |
Create portable audit bundle (tar.gz with verify instructions) |
assay compliance report |
Generate compliance evidence report |
Pack + baseline management
| Command | Purpose |
|---|---|
assay packs list |
List local proof packs |
assay packs show |
Show pack details |
assay packs pin-baseline |
Pin a pack as the diff baseline |
assay baseline set|get |
Set or get the baseline pack for diff |
Key management
| Command | Purpose |
|---|---|
assay key generate |
Generate a new Ed25519 signing key |
assay key list |
List local signing keys and active signer |
assay key info |
Show key details (fingerprint, creation date) |
assay key set-active |
Set active signing key for future runs |
assay key rotate |
Generate a new key and switch active signer |
assay key export|import |
Export or import keys for CI or team sharing |
assay key revoke |
Revoke a signing key |
Lockfile + cards
| Command | Purpose |
|---|---|
assay lock write |
Freeze verification contract to lockfile |
assay lock check |
Validate lockfile against current card definitions |
assay lock init |
Initialize a new lockfile interactively |
assay cards list |
List built-in run cards and their claims |
assay cards show |
Show card details, claims, and parameters |
MCP + policy
| Command | Purpose |
|---|---|
assay mcp-proxy |
Transparent MCP proxy: intercept tool calls, emit receipts |
assay mcp policy init |
Generate a starter MCP policy YAML file |
assay mcp policy validate |
Validate a policy file against the schema |
assay policy impact |
Analyze policy impact on existing evidence |
Incident forensics
| Command | Purpose |
|---|---|
assay incident timeline |
Build incident timeline from receipts |
assay incident replay |
Replay an incident from receipt chain |
Demos
| Command | Purpose |
|---|---|
assay demo-incident |
Two-act scenario: passing run vs failing run |
assay demo-challenge |
CTF-style good + tampered pack pair |
assay demo-pack |
Generate demo packs (no config needed) |
- Start Here -- 6 steps from install to evidence in CI
- Evidence Packets -- compile, verify, and hand off reviewer-ready evidence packets
- Full Picture -- architecture, trust tiers, repo boundaries, release history
- Quickstart -- install, golden path, command reference
- For Compliance Teams -- what auditors see, evidence artifacts, framework alignment
- Compliance Citations -- exact regulatory references (EU AI Act, SOC 2, ISO 42001)
- Decision Escrow -- protocol model: agent actions don't settle until verified
- Roadmap -- phases, product boundary, execution stack
- Repo Map -- what lives where across the Assay ecosystem
- Pilot Program -- early adopter program details
-
"No receipts emitted" after
assay run: First, check whether your code has call sites:assay scan .-- if scan finds 0 sites, you may not be using a supported SDK yet. Installing Assay alone does not emit receipts; your runtime must be instrumented. If scan finds sites, check: (1) Is# assay:patchedin the file, or did you addpatch()/ a callback? Runassay scan . --reportto see patch status per file. (2) Did you install the SDK extra (python3 -m pip install "assay-ai[openai]")? (3) Didpatch()execute before the first model call? (4) Did you use--before your command (assay run -- python app.py)? Runassay doctorfor a full diagnostic. -
LangChain projects:
assay patchauto-instruments OpenAI and Anthropic SDKs but not LangChain (which uses callbacks, not monkey-patching). For LangChain, addAssayCallbackHandler()to your chain'scallbacksparameter manually. Seesrc/assay/integrations/langchain.pyfor the handler. -
assay run python app.pygives "No command provided": You need the--separator:assay run -c receipt_completeness -- python app.py. Everything after--is passed to the subprocess. -
Quickstart blocked on large directories:
assay quickstartguards against scanning system directories (>10K Python files). Use--forceto bypass:assay quickstart --force. -
macOS:
ModuleNotFoundErrorinsideassay runbut works outside it: On macOS,python3on PATH may point to a different Python version than where assay and your SDK are installed (e.g.python3→ 3.14, but packages are in 3.11). Use a virtual environment (recommended), or specify the exact interpreter:assay run -- python3.11 app.py. Check withpython3 --versionand compare to the Python where you installed Assay.
- Try it:
python3 -m pip install assay-ai && assay try - Questions / feedback: GitHub Discussions
- Bug reports: Issues
- Want this in your stack in 2 weeks? Pilot program -- we instrument your AI workflows, set up CI gates, and hand you a working evidence pipeline. Open a pilot inquiry.
| Repo | Purpose |
|---|---|
| assay | Core CLI, SDK, conformance corpus (this repo) |
| assay-verify-action | GitHub Action for CI verification |
| assay-ledger | Public transparency ledger |
| assay-proof-gallery | Live demo packs (PASS / HONEST FAIL / TAMPERED) |
Apache-2.0