Cerberus - AI-Powered PII Protection Gateway

GenAI-powered PII redaction gateway for enterprise compliance. Cerberus combines NLP detection, policy-based compliance (HIPAA/PCI-DSS/GDPR), and intelligent risk scoring to protect sensitive data.

Overview

Organizations handling sensitive data face a compliance paradox: traditional PII redaction tools are either too rigid (over-redacting and breaking workflows) or too lenient (missing context-dependent leaks and failing audits). Manual policy configuration is error-prone, and binary pass/fail decisions lack the nuance needed for operational flexibility.

Cerberus addresses these challenges with a GenAI-powered approach:

Smart Policy Recommendation: AI analyzes text and automatically suggests healthcare, finance, or general policies—even detecting cross-domain scenarios
Risk-Based Scoring: Continuous risk assessment (0.0–1.0) with specific risk factors instead of binary leak detection
Tiered Responses: Configurable thresholds trigger purge/alert/log actions based on risk levels
Explainable AI: Every decision includes reasoning and confidence scores for audit trails

Key Features

Intelligent Policy Selection

AI-powered domain detection automatically recommends the right policy context:

{
  "recommended_context": "finance",
  "confidence": 0.88,
  "reasoning": "Mixed healthcare and finance data. Finance has stricter thresholds.",
  "detected_domains": ["healthcare", "finance"],
  "risk_warning": "Cross-domain PII - use strictest policy"
}

Risk-Based Decision Making

Move beyond binary pass/fail with continuous risk assessment:

Risk Level	Score Range	Action
Low	0.0–0.3	Allow (properly redacted)
Medium	0.3–0.5	Log (contextual clues present)
High	0.5–0.7	Alert (format preservation detected)
Critical	0.7–1.0	Purge (direct PII exposure)

Policy-Driven Compliance

Pre-configured contexts for major compliance frameworks:

Policy	Compliance	Key Entities
Healthcare	HIPAA	PERSON, US_SSN, DATE_TIME, LOCATION, PHONE, EMAIL, IP_ADDRESS
Finance	PCI-DSS	CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER, US_SSN, DRIVER_LICENSE
General	GDPR	All 13 entity types (broad coverage)

Secure Restoration with Audit Trail

API key authentication with complete audit logging for compliance (HIPAA, PCI-DSS, GDPR).

System Architecture

Cerberus employs a three-layer security architecture:

flowchart TB
    subgraph Client["Client Layer"]
        API[HTTP Request]
    end

    subgraph Gateway["API Gateway"]
        FastAPI["FastAPI Server<br/>Authentication + Routing"]
    end

    subgraph Security["Three-Layer Security"]
        L1["Layer 1: NLP Detection<br/>(Microsoft Presidio)"]
        L2["Layer 2: Policy Engine<br/>(HIPAA/PCI-DSS/GDPR)"]
        L3["Layer 3: Risk Scorer<br/>(Phi-3 LLM via Ollama)"]
    end

    subgraph Storage["Data Layer"]
        Redis["Redis<br/>Token Storage (24hr TTL)"]
        Postgres["PostgreSQL<br/>API Keys + Audit Logs"]
    end

    subgraph Monitoring["Observability"]
        Prometheus["Prometheus<br/>Metrics Collection"]
        Grafana["Grafana<br/>Dashboards"]
    end

    API --> FastAPI
    FastAPI --> L1
    L1 --> L2
    L2 --> L3
    L1 --> Redis
    L3 --> Redis
    FastAPI --> Postgres
    FastAPI --> Prometheus
    Prometheus --> Grafana

Layer 1: NLP Detection — Presidio analyzes text for 13 PII entity types (EMAIL, SSN, CREDIT_CARD, etc.)

Layer 2: Policy Engine — Filters entities by compliance context, confidence thresholds, and restoration controls

Layer 3: GenAI Risk Scorer — Assigns risk scores, triggers tiered responses, provides explainability

Quick Start

# Clone the repository
git clone <repository-url> && cd cerberus

# Start all services with Docker Compose
docker compose up --build

# Initialize database and generate admin API key
docker compose exec api uv run python scripts/init_db.py
# Save the displayed API key - it cannot be retrieved later!

For detailed setup instructions, including local development without Docker, see QUICKSTART.md.

Project Structure

cerberus/
├── app/                        # Core application
│   ├── main.py                 # FastAPI endpoints
│   ├── service.py              # Redaction service
│   ├── verification.py         # LLM-based risk scorer
│   ├── policy_recommendation.py # Smart policy suggester
│   ├── policies.py             # Policy engine
│   ├── database.py             # SQLAlchemy models
│   ├── auth.py                 # API key authentication
│   ├── audit.py                # Audit logging
│   ├── config.py               # Centralized configuration
│   └── prompts/                # LLM prompt engineering
│       ├── verification_prompts.py
│       ├── policy_prompts.py
│       └── few_shot_examples.py
├── tests/                      # Test suite (127+ tests)
│   ├── unit/                   # Unit tests
│   └── integration/            # Integration tests
├── evaluation/                 # Benchmark suite (43 test cases)
├── scripts/                    # Utility scripts
├── docker-compose.yml          # Development stack
├── pyproject.toml              # Python dependencies (uv)
└── .env.example                # Configuration template

Documentation

Document	Description
QUICKSTART.md	Prerequisites, setup instructions, and troubleshooting
ARCHITECTURE.md	Technical design decisions, API reference, and diagrams
DEPLOYMENT.md	Production deployment strategies and considerations
ROADMAP.md	Project milestones and future plans
CHANGELOG.md	Version history and release notes
CONTRIBUTING.md	Contribution guidelines

Technical Stack

Category	Technology
Web Framework	FastAPI + Uvicorn
NLP Detection	Microsoft Presidio + spaCy (en_core_web_lg)
GenAI	Phi-3 LLM via Ollama
Token Storage	Redis (24hr TTL)
Database	PostgreSQL (async via SQLAlchemy + asyncpg)
Monitoring	Prometheus + Grafana
Package Management	uv (fast Python resolver)
Testing	pytest (62%+ coverage) + fakeredis + respx
Deployment	Docker Compose

License

This project is licensed under the MIT License. See LICENSE for details.

Intelligent PII redaction with GenAI-powered compliance. Deploy with confidence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerberus - AI-Powered PII Protection Gateway

Table of Contents

Overview

Key Features

Intelligent Policy Selection

Risk-Based Decision Making

Policy-Driven Compliance

Secure Restoration with Audit Trail

System Architecture

Quick Start

Project Structure

Documentation

Technical Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
app		app
evaluation		evaluation
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
ROADMAP.md		ROADMAP.md
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
stress_test.py		stress_test.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Cerberus - AI-Powered PII Protection Gateway

Table of Contents

Overview

Key Features

Intelligent Policy Selection

Risk-Based Decision Making

Policy-Driven Compliance

Secure Restoration with Audit Trail

System Architecture

Quick Start

Project Structure

Documentation

Technical Stack

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages