Skip to content

joy7758/verifiable-agent-demo

Repository files navigation

Verifiable Agent Demo

The minimal end-to-end demonstration for the Digital Biosphere Architecture stack.

This repository connects persona, interaction semantics, governance context, execution traceability, and audit evidence into one walkthrough. It is a demo and reference path rather than a general-purpose framework.

Shared doctrine:

Sandbox controls execution; portable evidence verifies execution.

  1. Governance decides what should be allowed.
  2. Execution integrity proves what actually happened.
  3. Audit evidence exports artifacts for independent review.
flowchart LR
    Persona["Persona (POP)"] --> Intent["Intent Object (AIP)"]
    Intent --> Governance["Governance Check"]
    Governance --> Trace["Execution Trace"]
    Trace --> Audit["Audit Evidence (ARO)"]
Loading

What this demo proves

  • a portable persona-oriented entry point can be projected into runtime
  • explicit intent and action objects can be emitted before execution
  • result objects can be emitted after execution
  • execution steps can be recorded as inspectable evidence
  • audit-facing artifacts can be exported as bounded outputs

Architecture Path in this Demo

  • Persona Layer -> POP-aligned persona context carried into the run
  • Interaction Layer -> intent, action, and result objects emitted under interaction/
  • Governance Layer -> referenced as the control checkpoint for runtime policy and budget constraints
  • Execution Integrity Layer -> runtime execution trace and verifiable execution context
  • Audit Evidence Layer -> ARO-style exported evidence artifacts

This repository does not claim a full Token Governor integration. It demonstrates a minimal aligned path across the broader stack, with explicit governance checkpoint references in the emitted interaction and result objects.

It now also includes one fixed enterprise sandbox artifact chain for the scenario 整理客户拜访记录 → 生成周报 → 发起审批, while still not claiming a general full-stack Token Governor integration.

How to read this demo

This demo is a guided path across layers. It is not the normative specification for each layer, and it points outward to the canonical repositories for those layers: digital-biosphere-architecture, persona-object-protocol, agent-intent-protocol, token-governor, and aro-audit.

Execution Evidence Demo Note

See docs/execution-evidence-demo-note.md.

Expected Artifacts

Repo-tracked sample bundle:

  • interaction/intent.json
  • interaction/action.json
  • interaction/result.json
  • evidence/example_audit.json
  • evidence/result.json
  • evidence/sample-manifest.json

Additional tracked example:

  • evidence/crew_demo_audit.json

Current concrete examples in this repository include:

  • docs/quick-walkthrough.md
  • docs/interaction-flow.md
  • docs/shortest-validation-loop.md

Run the Demo

Fastest local path

python3 -m demo.agent

Scripted wrapper

bash scripts/run_demo.sh

This local wrapper writes fresh output under artifacts/demo_output/.

Enterprise sandbox artifact chain

python3 examples/enterprise_sandbox_demo/run.py

This writes a reviewer-facing directory under artifacts/enterprise_sandbox_demo/ containing:

  • intent.json
  • policy.json
  • trace.jsonl
  • sep.bundle.json
  • replay_verdict.json
  • audit_receipt.json

Existing CrewAI demo path

bash scripts/setup_framework_venv.sh
.venv/bin/python crew/crew_demo.py

Environment notes:

  • Python 3 is sufficient for the minimal local path.
  • Refresh the tracked deterministic sample bundle with python3 scripts/refresh_demo_samples.py.
  • The optional CrewAI and LangChain paths should run from a git-ignored local .venv/ created by scripts/setup_framework_venv.sh.
  • The pinned framework helper environment currently uses crewai 1.10.1, langchain 1.2.12, and langchain-core 1.2.18.
  • CrewAI currently requires Python <3.14.
  • Both demo paths use deterministic local mock data and do not require external API calls.

Repository Automation

  • The Mermaid render workflow opens PRs to main only through a dedicated GitHub App.
  • Configure repository variable PROTOCOL_BOT_APP_ID and repository secret PROTOCOL_BOT_PRIVATE_KEY under Settings -> Secrets and variables -> Actions.
  • The default repository GITHUB_TOKEN remains read-only and is not used for auto-PR promotion.

Paper Evaluation Harness

This repository now includes a paper-ready evaluation harness for Execution Evidence Architecture for Agentic Software Systems: From Intent Objects to Verifiable Audit Receipts.

Primary entry points:

  • make eval-baseline
  • make eval-evidence
  • make eval-external-baseline
  • make eval-framework-pair
  • make eval-langchain-pair
  • make eval-ablation
  • make falsification-checks
  • make human-review-kit
  • make review-sample
  • make compare
  • make paper-eval
  • make top-journal-pack

Supporting material:

Generated outputs:

  • artifacts/runs/<task_id>/<mode>/
  • docs/paper_support/comparison-summary.md
  • docs/paper_support/comparison-summary.csv
  • artifacts/metrics/comparison-summary.json
  • docs/paper_support/external-baseline-summary.md
  • docs/paper_support/framework-pair-summary.md
  • docs/paper_support/langchain-pair-summary.md
  • docs/paper_support/ablation-summary.md
  • docs/paper_support/falsification-summary.md
  • artifacts/human_review/synthetic-review-summary.json

English LaTeX Manuscript Draft

The repository also includes a manuscript draft grounded in the current implemented harness and checked-in metrics:

Related Repositories

Minimal Reference Surface

  • interaction/ for explicit interaction objects
  • evidence/ for audit and result artifacts
  • demo/ and crew/ for runnable entry points
  • integration/ for persona and intent adapters
  • docs/spec/ for schema notes and example payloads

Further Reading

Releases

No releases published

Packages

 
 
 

Contributors