GenAI-powered PII redaction gateway for enterprise compliance. Cerberus combines NLP detection, policy-based compliance (HIPAA/PCI-DSS/GDPR), and intelligent risk scoring to protect sensitive data.
- Overview
- Key Features
- System Architecture
- Quick Start
- Project Structure
- Documentation
- Technical Stack
- License
Organizations handling sensitive data face a compliance paradox: traditional PII redaction tools are either too rigid (over-redacting and breaking workflows) or too lenient (missing context-dependent leaks and failing audits). Manual policy configuration is error-prone, and binary pass/fail decisions lack the nuance needed for operational flexibility.
Cerberus addresses these challenges with a GenAI-powered approach:
- Smart Policy Recommendation: AI analyzes text and automatically suggests healthcare, finance, or general policies—even detecting cross-domain scenarios
- Risk-Based Scoring: Continuous risk assessment (0.0–1.0) with specific risk factors instead of binary leak detection
- Tiered Responses: Configurable thresholds trigger purge/alert/log actions based on risk levels
- Explainable AI: Every decision includes reasoning and confidence scores for audit trails
AI-powered domain detection automatically recommends the right policy context:
{
"recommended_context": "finance",
"confidence": 0.88,
"reasoning": "Mixed healthcare and finance data. Finance has stricter thresholds.",
"detected_domains": ["healthcare", "finance"],
"risk_warning": "Cross-domain PII - use strictest policy"
}Move beyond binary pass/fail with continuous risk assessment:
| Risk Level | Score Range | Action |
|---|---|---|
| Low | 0.0–0.3 | Allow (properly redacted) |
| Medium | 0.3–0.5 | Log (contextual clues present) |
| High | 0.5–0.7 | Alert (format preservation detected) |
| Critical | 0.7–1.0 | Purge (direct PII exposure) |
Pre-configured contexts for major compliance frameworks:
| Policy | Compliance | Key Entities |
|---|---|---|
| Healthcare | HIPAA | PERSON, US_SSN, DATE_TIME, LOCATION, PHONE, EMAIL, IP_ADDRESS |
| Finance | PCI-DSS | CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER, US_SSN, DRIVER_LICENSE |
| General | GDPR | All 13 entity types (broad coverage) |
API key authentication with complete audit logging for compliance (HIPAA, PCI-DSS, GDPR).
Cerberus employs a three-layer security architecture:
flowchart TB
subgraph Client["Client Layer"]
API[HTTP Request]
end
subgraph Gateway["API Gateway"]
FastAPI["FastAPI Server<br/>Authentication + Routing"]
end
subgraph Security["Three-Layer Security"]
L1["Layer 1: NLP Detection<br/>(Microsoft Presidio)"]
L2["Layer 2: Policy Engine<br/>(HIPAA/PCI-DSS/GDPR)"]
L3["Layer 3: Risk Scorer<br/>(Phi-3 LLM via Ollama)"]
end
subgraph Storage["Data Layer"]
Redis["Redis<br/>Token Storage (24hr TTL)"]
Postgres["PostgreSQL<br/>API Keys + Audit Logs"]
end
subgraph Monitoring["Observability"]
Prometheus["Prometheus<br/>Metrics Collection"]
Grafana["Grafana<br/>Dashboards"]
end
API --> FastAPI
FastAPI --> L1
L1 --> L2
L2 --> L3
L1 --> Redis
L3 --> Redis
FastAPI --> Postgres
FastAPI --> Prometheus
Prometheus --> Grafana
Layer 1: NLP Detection — Presidio analyzes text for 13 PII entity types (EMAIL, SSN, CREDIT_CARD, etc.)
Layer 2: Policy Engine — Filters entities by compliance context, confidence thresholds, and restoration controls
Layer 3: GenAI Risk Scorer — Assigns risk scores, triggers tiered responses, provides explainability
# Clone the repository
git clone <repository-url> && cd cerberus
# Start all services with Docker Compose
docker compose up --build
# Initialize database and generate admin API key
docker compose exec api uv run python scripts/init_db.py
# Save the displayed API key - it cannot be retrieved later!For detailed setup instructions, including local development without Docker, see QUICKSTART.md.
cerberus/
├── app/ # Core application
│ ├── main.py # FastAPI endpoints
│ ├── service.py # Redaction service
│ ├── verification.py # LLM-based risk scorer
│ ├── policy_recommendation.py # Smart policy suggester
│ ├── policies.py # Policy engine
│ ├── database.py # SQLAlchemy models
│ ├── auth.py # API key authentication
│ ├── audit.py # Audit logging
│ ├── config.py # Centralized configuration
│ └── prompts/ # LLM prompt engineering
│ ├── verification_prompts.py
│ ├── policy_prompts.py
│ └── few_shot_examples.py
├── tests/ # Test suite (127+ tests)
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── evaluation/ # Benchmark suite (43 test cases)
├── scripts/ # Utility scripts
├── docker-compose.yml # Development stack
├── pyproject.toml # Python dependencies (uv)
└── .env.example # Configuration template
| Document | Description |
|---|---|
| QUICKSTART.md | Prerequisites, setup instructions, and troubleshooting |
| ARCHITECTURE.md | Technical design decisions, API reference, and diagrams |
| DEPLOYMENT.md | Production deployment strategies and considerations |
| ROADMAP.md | Project milestones and future plans |
| CHANGELOG.md | Version history and release notes |
| CONTRIBUTING.md | Contribution guidelines |
| Category | Technology |
|---|---|
| Web Framework | FastAPI + Uvicorn |
| NLP Detection | Microsoft Presidio + spaCy (en_core_web_lg) |
| GenAI | Phi-3 LLM via Ollama |
| Token Storage | Redis (24hr TTL) |
| Database | PostgreSQL (async via SQLAlchemy + asyncpg) |
| Monitoring | Prometheus + Grafana |
| Package Management | uv (fast Python resolver) |
| Testing | pytest (62%+ coverage) + fakeredis + respx |
| Deployment | Docker Compose |
This project is licensed under the MIT License. See LICENSE for details.
Intelligent PII redaction with GenAI-powered compliance. Deploy with confidence.