Skip to content

theveyanoir-ai/genesis-align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ALIGN: Agent Constitution Auditor

"Your AI says one thing and does another — ALIGN finds the gap."

ALIGN is a critical tool in the GENESIS project, designed to audit the alignment of AI agents. As agents become more autonomous, ensuring their behavior truly reflects their stated constitution and safety protocols becomes paramount. ALIGN provides the infrastructure to measure and report on these alignment gaps, offering insights into an agent's implicit rules derived from its behavior logs versus its explicit, stated constitution.

Why ALIGN Matters

As AI agents gain more autonomy and are deployed in increasingly critical roles, an unmeasured gap between their stated rules (constitution/system prompt) and their actual behavior can lead to unpredictable, unsafe, or undesirable outcomes. ALIGN provides:

  • Transparency: Understand the true operating principles of an AI.
  • Safety: Identify deviations from safety protocols that might not be immediately apparent.
  • Reliability: Ensure agents consistently follow their intended directives.
  • Trust: Build confidence in autonomous systems by verifying their adherence to specified rules.

ALIGN is serious AI safety infrastructure, wrapped in a practical, easy-to-use CLI. Our vision is to run ALIGN on every deployed agent as part of CI/CD, creating a continuous feedback loop for agent development and oversight.

Features

  • Log Parsing: Ingests agent behavior logs from various formats (OpenAI, Anthropic, JSONL, plain text).
  • Constitution Parsing: Extracts explicit rules from system prompts and constitution documents.
  • Implicit Rule Extraction: Uses advanced LLM analysis to infer unspoken rules from observed agent behavior.
  • Gap Analysis: Compares explicit and implicit rules to find:
    • Stated rules that are not followed.
    • Implicit rules that are consistently followed but not explicitly stated.
    • Contradictions between stated and implicit rules.
  • Alignment Reporting: Generates comprehensive, human-readable markdown reports with alignment scores, examples, and actionable recommendations.
  • CLI Interface: Simple command-line access for audits, rule extraction, inference, and quick scoring.

How to use it

Installation

git clone https://github.com/veyanoir/align.git # Placeholder, replace with actual repo
cd align
pip install -e .

CLI Commands

  • Full Audit: Run a complete alignment audit, generating a detailed report.
    align audit --logs agent_logs.jsonl --constitution system_prompt.txt --output_dir reports/
  • Extract Stated Rules: Parse and display only the explicit rules from a constitution.
    align extract-rules --constitution system_prompt.txt
  • Infer Implicit Rules: Analyze behavior logs to infer and list implicit rules.
    align infer --logs agent_logs.jsonl
  • Quick Alignment Score: Get a rapid score without a full report.
    align score --logs agent_logs.jsonl --constitution system_prompt.txt

Example Output Snippets

### 💡 Implicit Rule Detected (Confidence: High, Frequency: 15)
**Statement:** "The agent prioritizes user privacy by redacting sensitive information from public outputs."
**Evidence:**
-   `behavior_event_id_005`: Refused to log user's email ID.
-   `behavior_event_id_012`: Replaced credit card number with `[REDACTED]` in a shared transcript.

---

### ⚠️ Alignment Gap: Stated Not Followed
**Rule:** "Always acknowledge user commands with an estimated completion time."
**Observation:** In 3 out of 5 observed instances, this rule was not followed.
**Example:** User command received at `[timestamp]`, agent responded directly with action without `ack`.

---

**Alignment Score: 78/100**
_Recommendations: Improve acknowledgment consistency. Formalize implicit privacy rules into the constitution._

Vision: CI/CD Integration

Imagine a world where every AI agent deployed goes through an ALIGN audit as part of its continuous integration and continuous deployment pipeline. This ensures that as agents evolve, their alignment with core principles and safety guardrails is continuously verified, allowing for safe, reliable, and trustworthy autonomous systems.

Join us in building the future of AI safety.

About

🎯 ALIGN — Agent Constitution Auditor. Find the gap between what your AI says and what it does. Part of the GENESIS AGI stack.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages