Skip to content

bhattaraiprayag/Cerberus

Repository files navigation

Cerberus - AI-Powered PII Protection Gateway

Python 3.13+ FastAPI Docker License: MIT

GenAI-powered PII redaction gateway for enterprise compliance. Cerberus combines NLP detection, policy-based compliance (HIPAA/PCI-DSS/GDPR), and intelligent risk scoring to protect sensitive data.


Table of Contents


Overview

Organizations handling sensitive data face a compliance paradox: traditional PII redaction tools are either too rigid (over-redacting and breaking workflows) or too lenient (missing context-dependent leaks and failing audits). Manual policy configuration is error-prone, and binary pass/fail decisions lack the nuance needed for operational flexibility.

Cerberus addresses these challenges with a GenAI-powered approach:

  • Smart Policy Recommendation: AI analyzes text and automatically suggests healthcare, finance, or general policies—even detecting cross-domain scenarios
  • Risk-Based Scoring: Continuous risk assessment (0.0–1.0) with specific risk factors instead of binary leak detection
  • Tiered Responses: Configurable thresholds trigger purge/alert/log actions based on risk levels
  • Explainable AI: Every decision includes reasoning and confidence scores for audit trails

Key Features

Intelligent Policy Selection

AI-powered domain detection automatically recommends the right policy context:

{
  "recommended_context": "finance",
  "confidence": 0.88,
  "reasoning": "Mixed healthcare and finance data. Finance has stricter thresholds.",
  "detected_domains": ["healthcare", "finance"],
  "risk_warning": "Cross-domain PII - use strictest policy"
}

Risk-Based Decision Making

Move beyond binary pass/fail with continuous risk assessment:

Risk Level Score Range Action
Low 0.0–0.3 Allow (properly redacted)
Medium 0.3–0.5 Log (contextual clues present)
High 0.5–0.7 Alert (format preservation detected)
Critical 0.7–1.0 Purge (direct PII exposure)

Policy-Driven Compliance

Pre-configured contexts for major compliance frameworks:

Policy Compliance Key Entities
Healthcare HIPAA PERSON, US_SSN, DATE_TIME, LOCATION, PHONE, EMAIL, IP_ADDRESS
Finance PCI-DSS CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER, US_SSN, DRIVER_LICENSE
General GDPR All 13 entity types (broad coverage)

Secure Restoration with Audit Trail

API key authentication with complete audit logging for compliance (HIPAA, PCI-DSS, GDPR).


System Architecture

Cerberus employs a three-layer security architecture:

flowchart TB
    subgraph Client["Client Layer"]
        API[HTTP Request]
    end

    subgraph Gateway["API Gateway"]
        FastAPI["FastAPI Server<br/>Authentication + Routing"]
    end

    subgraph Security["Three-Layer Security"]
        L1["Layer 1: NLP Detection<br/>(Microsoft Presidio)"]
        L2["Layer 2: Policy Engine<br/>(HIPAA/PCI-DSS/GDPR)"]
        L3["Layer 3: Risk Scorer<br/>(Phi-3 LLM via Ollama)"]
    end

    subgraph Storage["Data Layer"]
        Redis["Redis<br/>Token Storage (24hr TTL)"]
        Postgres["PostgreSQL<br/>API Keys + Audit Logs"]
    end

    subgraph Monitoring["Observability"]
        Prometheus["Prometheus<br/>Metrics Collection"]
        Grafana["Grafana<br/>Dashboards"]
    end

    API --> FastAPI
    FastAPI --> L1
    L1 --> L2
    L2 --> L3
    L1 --> Redis
    L3 --> Redis
    FastAPI --> Postgres
    FastAPI --> Prometheus
    Prometheus --> Grafana
Loading

Layer 1: NLP Detection — Presidio analyzes text for 13 PII entity types (EMAIL, SSN, CREDIT_CARD, etc.)

Layer 2: Policy Engine — Filters entities by compliance context, confidence thresholds, and restoration controls

Layer 3: GenAI Risk Scorer — Assigns risk scores, triggers tiered responses, provides explainability


Quick Start

# Clone the repository
git clone <repository-url> && cd cerberus

# Start all services with Docker Compose
docker compose up --build

# Initialize database and generate admin API key
docker compose exec api uv run python scripts/init_db.py
# Save the displayed API key - it cannot be retrieved later!

For detailed setup instructions, including local development without Docker, see QUICKSTART.md.


Project Structure

cerberus/
├── app/                        # Core application
│   ├── main.py                 # FastAPI endpoints
│   ├── service.py              # Redaction service
│   ├── verification.py         # LLM-based risk scorer
│   ├── policy_recommendation.py # Smart policy suggester
│   ├── policies.py             # Policy engine
│   ├── database.py             # SQLAlchemy models
│   ├── auth.py                 # API key authentication
│   ├── audit.py                # Audit logging
│   ├── config.py               # Centralized configuration
│   └── prompts/                # LLM prompt engineering
│       ├── verification_prompts.py
│       ├── policy_prompts.py
│       └── few_shot_examples.py
├── tests/                      # Test suite (127+ tests)
│   ├── unit/                   # Unit tests
│   └── integration/            # Integration tests
├── evaluation/                 # Benchmark suite (43 test cases)
├── scripts/                    # Utility scripts
├── docker-compose.yml          # Development stack
├── pyproject.toml              # Python dependencies (uv)
└── .env.example                # Configuration template

Documentation

Document Description
QUICKSTART.md Prerequisites, setup instructions, and troubleshooting
ARCHITECTURE.md Technical design decisions, API reference, and diagrams
DEPLOYMENT.md Production deployment strategies and considerations
ROADMAP.md Project milestones and future plans
CHANGELOG.md Version history and release notes
CONTRIBUTING.md Contribution guidelines

Technical Stack

Category Technology
Web Framework FastAPI + Uvicorn
NLP Detection Microsoft Presidio + spaCy (en_core_web_lg)
GenAI Phi-3 LLM via Ollama
Token Storage Redis (24hr TTL)
Database PostgreSQL (async via SQLAlchemy + asyncpg)
Monitoring Prometheus + Grafana
Package Management uv (fast Python resolver)
Testing pytest (62%+ coverage) + fakeredis + respx
Deployment Docker Compose

License

This project is licensed under the MIT License. See LICENSE for details.


Intelligent PII redaction with GenAI-powered compliance. Deploy with confidence.

About

GenAI-powered PII redaction gateway for enterprise compliance.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages