"Your AI says one thing and does another — ALIGN finds the gap."
ALIGN is a critical tool in the GENESIS project, designed to audit the alignment of AI agents. As agents become more autonomous, ensuring their behavior truly reflects their stated constitution and safety protocols becomes paramount. ALIGN provides the infrastructure to measure and report on these alignment gaps, offering insights into an agent's implicit rules derived from its behavior logs versus its explicit, stated constitution.
As AI agents gain more autonomy and are deployed in increasingly critical roles, an unmeasured gap between their stated rules (constitution/system prompt) and their actual behavior can lead to unpredictable, unsafe, or undesirable outcomes. ALIGN provides:
- Transparency: Understand the true operating principles of an AI.
- Safety: Identify deviations from safety protocols that might not be immediately apparent.
- Reliability: Ensure agents consistently follow their intended directives.
- Trust: Build confidence in autonomous systems by verifying their adherence to specified rules.
ALIGN is serious AI safety infrastructure, wrapped in a practical, easy-to-use CLI. Our vision is to run ALIGN on every deployed agent as part of CI/CD, creating a continuous feedback loop for agent development and oversight.
- Log Parsing: Ingests agent behavior logs from various formats (OpenAI, Anthropic, JSONL, plain text).
- Constitution Parsing: Extracts explicit rules from system prompts and constitution documents.
- Implicit Rule Extraction: Uses advanced LLM analysis to infer unspoken rules from observed agent behavior.
- Gap Analysis: Compares explicit and implicit rules to find:
- Stated rules that are not followed.
- Implicit rules that are consistently followed but not explicitly stated.
- Contradictions between stated and implicit rules.
- Alignment Reporting: Generates comprehensive, human-readable markdown reports with alignment scores, examples, and actionable recommendations.
- CLI Interface: Simple command-line access for audits, rule extraction, inference, and quick scoring.
git clone https://github.com/veyanoir/align.git # Placeholder, replace with actual repo
cd align
pip install -e .- Full Audit: Run a complete alignment audit, generating a detailed report.
align audit --logs agent_logs.jsonl --constitution system_prompt.txt --output_dir reports/
- Extract Stated Rules: Parse and display only the explicit rules from a constitution.
align extract-rules --constitution system_prompt.txt
- Infer Implicit Rules: Analyze behavior logs to infer and list implicit rules.
align infer --logs agent_logs.jsonl
- Quick Alignment Score: Get a rapid score without a full report.
align score --logs agent_logs.jsonl --constitution system_prompt.txt
### 💡 Implicit Rule Detected (Confidence: High, Frequency: 15)
**Statement:** "The agent prioritizes user privacy by redacting sensitive information from public outputs."
**Evidence:**
- `behavior_event_id_005`: Refused to log user's email ID.
- `behavior_event_id_012`: Replaced credit card number with `[REDACTED]` in a shared transcript.
---
### ⚠️ Alignment Gap: Stated Not Followed
**Rule:** "Always acknowledge user commands with an estimated completion time."
**Observation:** In 3 out of 5 observed instances, this rule was not followed.
**Example:** User command received at `[timestamp]`, agent responded directly with action without `ack`.
---
**Alignment Score: 78/100**
_Recommendations: Improve acknowledgment consistency. Formalize implicit privacy rules into the constitution._Imagine a world where every AI agent deployed goes through an ALIGN audit as part of its continuous integration and continuous deployment pipeline. This ensures that as agents evolve, their alignment with core principles and safety guardrails is continuously verified, allowing for safe, reliable, and trustworthy autonomous systems.
Join us in building the future of AI safety.