Lightweight AI Safety Auditing Framework
Contributors:
Michael A. Riegler (Simula)
Sushant Gautam (SimulaMet)
Klas H. Pettersen (SimulaMet)
Maja Gran Erke (The Norwegian Directorate of Health)
Hilde Lovett (The Norwegian Directorate of Health)
Sunniva BjΓΈrklund (The Norwegian Directorate of Health)
Tor-StΓ₯le Hansen (Specialist Director, Ministry of Defense Norway)
SimpleAudit uses different models such as Claude for multilingual auditing/red-teaming your AI systems through adversarial probing. It is simple, easy to extend, and requires minimal setup. It supports models via API or locally running.
| Tool | Complexity | Dependencies | Cost | Approach |
|---|---|---|---|---|
| SimpleAudit | β Simple | 2 packages | $ Low | Adversarial probing |
| Petri | βββ Complex | Many | $$$ High | Multi-agent framework |
| RAGAS | ββ Medium | Several | Free | Metrics only |
| Custom | βββ Complex | Varies | Varies | Build from scratch |
pip install simpleaudit
# With plotting support
pip install simpleaudit[plot]Or install from GitHub:
pip install git+https://github.com/kelkalot/simpleaudit.gitfrom simpleaudit import Auditor
# Create auditor pointing to your AI system (default: Anthropic Claude)
auditor = Auditor(
target="http://localhost:8000/v1/chat/completions",
# Uses ANTHROPIC_API_KEY env var, or pass: api_key="sk-..."
)
# Run built-in safety scenarios
results = auditor.run("safety")
# View results
results.summary()
results.plot()
results.save("audit_results.json")# OpenAI (requires: pip install simpleaudit[openai])
auditor = Auditor(
target="http://localhost:8000/v1/chat/completions",
provider="openai", # Uses OPENAI_API_KEY env var
)
# Grok via xAI (requires: pip install simpleaudit[openai])
auditor = Auditor(
target="http://localhost:8000/v1/chat/completions",
provider="grok", # Uses XAI_API_KEY env var
)# Ollama - for locally served models
# First: ollama serve && ollama pull llama3.2
auditor = Auditor(
target="http://localhost:8000/v1/chat/completions",
provider="ollama", # Uses local Ollama instance
model="llama3.2", # Or "mistral", "codellama", etc.
)
# HuggingFace - for direct transformers inference
auditor = Auditor(
target="http://localhost:8000/v1/chat/completions",
provider="huggingface",
model="meta-llama/Llama-3.2-1B-Instruct",
)ModelAuditor audits models directly via their APIs without needing an external HTTP endpoint:
from simpleaudit import ModelAuditor
# Basic usage - audit Claude with a system prompt
auditor = ModelAuditor(
provider="anthropic", # Target model provider
system_prompt="You are a helpful assistant.", # Optional system prompt
)
results = auditor.run("system_prompt")
results.summary()| Parameter | Description | Default |
|---|---|---|
provider |
Target model: "anthropic", "openai", "grok", "huggingface", "ollama" |
"anthropic" |
model |
Model name (e.g., "gpt-4o", "llama3.2") |
Provider default |
system_prompt |
System prompt for target model (or None) |
None |
judge_provider |
Provider for judging (can differ from target) | Same as provider |
judge_model |
Model for judging | Provider default |
max_turns |
Conversation turns per scenario | 5 |
Use different providers for target and judge:
# Test OpenAI, judged by Claude
auditor = ModelAuditor(
provider="openai", # Target: OpenAI
model="gpt-4o",
system_prompt="Be helpful and safe.",
judge_provider="anthropic", # Judge: Claude
)Audit local models without any API keys:
# Test a local Ollama model
auditor = ModelAuditor(
provider="ollama",
model="llama3.2",
system_prompt="You are a helpful assistant.",
)
results = auditor.run("safety")
# Test a HuggingFace model (GPU required/recommended)
auditor = ModelAuditor(
provider="huggingface",
model="meta-llama/Llama-3.2-1B-Instruct",
)
results = auditor.run("system_prompt")Test model's default behavior:
auditor = ModelAuditor(
provider="openai",
# system_prompt=None, # Omit or set to None
)
results = auditor.run("safety")SimpleAudit includes pre-built scenario packs:
| Pack | Scenarios | Description |
|---|---|---|
safety |
8 | General AI safety (hallucination, manipulation, boundaries) |
rag |
8 | RAG-specific (source attribution, retrieval boundaries) |
health |
8 | Healthcare domain (emergency, diagnosis, prescriptions) |
system_prompt |
8 | System prompt adherence and bypass testing |
all |
32 | All scenarios combined |
# List available packs
from simpleaudit import list_scenario_packs
print(list_scenario_packs())
# {'safety': 8, 'rag': 8, 'health': 8, 'system_prompt': 8, 'all': 32}
# Run specific pack
results = auditor.run("rag")
# Run multiple packs
results = auditor.run("all")Create your own scenarios:
my_scenarios = [
{
"name": "Custom Test 1",
"description": (
"Test if the system does X when the user asks Y. "
"The system should respond by doing Z."
),
},
{
"name": "Custom Test 2",
"description": "Another test description...",
},
]
results = auditor.run(my_scenarios)auditor = Auditor(
# Required
target="http://localhost:8000/v1/chat/completions",
# Provider selection
provider="anthropic", # "anthropic" (default), "openai", or "grok"
api_key="sk-...", # Or use env vars (see below)
model="claude-sonnet-4-20250514", # Provider-specific model name
# Other options
target_model="my-model", # Model name sent to target API
max_turns=5, # Conversation turns per scenario
timeout=120.0, # Request timeout (seconds)
verbose=True, # Print progress
prompt_for_key=True, # Prompt for API key if not found
)
# Run with custom settings
results = auditor.run(
"safety",
max_turns=3, # Override default turns
language="Norwegian", # Probe language (default: English)
)| Provider | Environment Variable | Default Model |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
| OpenAI | OPENAI_API_KEY |
gpt-4o |
| Grok | XAI_API_KEY |
grok-3 |
results = auditor.run("safety")
# Summary statistics
print(f"Score: {results.score}/100")
print(f"Passed: {results.passed}/{len(results)}")
print(f"Critical issues: {results.critical_count}")
# Severity distribution
print(results.severity_distribution)
# {'pass': 4, 'medium': 2, 'high': 1, 'critical': 1}
# Detailed inspection
for result in results:
print(f"{result.scenario_name}: {result.severity}")
print(f" Issues: {result.issues_found}")
print(f" Summary: {result.summary}")
# Export
results.save("audit_results.json")
results.plot(save_path="audit_chart.png")| Level | Score | Meaning |
|---|---|---|
| π’ pass | 4 | Handled appropriately |
| π΅ low | 3 | Minor issue |
| π‘ medium | 2 | Moderate concern |
| π high | 1 | Significant issue |
| π΄ critical | 0 | Dangerous behavior |
Your target must be an OpenAI-compatible chat completions endpoint:
POST /v1/chat/completions
{
"model": "your-model",
"messages": [
{"role": "user", "content": "Hello"}
]
}
Works with:
- OpenAI API
- Ollama (
ollama serve) - vLLM
- LiteLLM
- Any OpenAI-compatible server
- Custom RAG systems with chat wrapper
# 1. Create an OpenAI-compatible wrapper for your RAG
# (see examples/rag_server.py)
# 2. Start your RAG server
# python rag_server.py # Runs on localhost:8000
# 3. Audit it
from simpleaudit import Auditor
auditor = Auditor("http://localhost:8000/v1/chat/completions")
results = auditor.run("rag") # RAG-specific scenarios
results.summary()SimpleAudit can use different models to probe generation and judging. This example is based on Claude:
| Scenarios | Turns | Estimated Cost |
|---|---|---|
| 8 | 5 | ~$2-4 |
| 24 | 5 | ~$6-12 |
| 24 | 10 | ~$12-24 |
Costs depend on response lengths and Claude model used.
Contributions welcome! Areas of interest:
- New scenario packs (legal, finance, education, etc.)
- Additional judge criteria
- More target adapters
- Documentation improvements
- π Digital Public Good Compliance β SDG alignment, ownership, standards
- π€ Code of Conduct β Community guidelines and responsible use
- π Security Policy β Vulnerability reporting and security considerations
MIT License - see LICENSE for details.