Synthetic Identity Fraud Detection System

A specialized fraud detection platform focused on detecting synthetic identities—fabricated personas combining real and fake PII elements that are built up over time before "busting out."

Features

Graph-based Identity Resolution: Links SSN, address, phone, and email clusters using Neo4j
PII Velocity Analysis: Tracks how fast identity elements spread across applications
Synthetic Identity Scoring: ML-based scoring using cross-institution data patterns
Bust-out Prediction: Predicts imminent bust-out before it happens
Real-time Detection: Kafka-based streaming for real-time application scoring
Investigation Tools: Case management, graph visualization, and SAR generation

Quick Start

Prerequisites

Python 3.11+
Docker and Docker Compose
Neo4j 5.x
Apache Kafka

Installation

# Clone the repository
git clone https://github.com/your-org/fraud-detection-system.git
cd fraud-detection-system

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Copy environment file
cp .env.example .env
# Edit .env with your configuration

Start Infrastructure

# Start all services with Docker Compose
docker-compose -f docker/docker-compose.yml up -d

# Or start individual services
docker-compose -f docker/docker-compose.yml up -d neo4j kafka postgres redis

Run the API

uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

API Usage

# Score an identity for synthetic risk
curl -X POST http://localhost:8000/api/v1/score/synthetic \
  -H "Content-Type: application/json" \
  -d '{
    "ssn_last4": "1234",
    "ssn_first5": "12345",
    "dob": "1985-03-15",
    "first_name": "John",
    "last_name": "Smith",
    "address": {"street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "90210"},
    "phone": "555-123-4567",
    "email": "john.smith@email.com",
    "application_date": "2024-01-15"
  }'

Project Structure

fraud-detection-system/
├── src/
│   ├── ingestion/          # Data ingestion (Kafka, bureau, consortium)
│   ├── identity_elements/  # PII analysis (SSN, address, phone, email, device)
│   ├── graph/              # Neo4j identity graph and GNN models
│   ├── detection/          # Detection models (synthetic, bust-out, ensemble)
│   ├── signals/            # Signal detectors (SSN-DOB mismatch, velocity, etc.)
│   ├── investigation/      # Case management and SAR generation
│   ├── api/                # FastAPI application
│   └── monitoring/         # Metrics and model monitoring
├── models/                 # Trained model files
├── data/                   # Reference data
├── scripts/                # Training and utility scripts
├── tests/                  # Test suite
└── docker/                 # Docker configuration

Training Models

# Generate synthetic training data
python scripts/generate_synthetic_data.py --output data/training --scenarios identities bust_out

# Train synthetic scorer
python scripts/train_synthetic_scorer.py \
  --data data/training/identities.parquet \
  --output models/synthetic_scorer/v1/model.joblib

# Train bust-out predictor
python scripts/train_bust_out_model.py \
  --data data/training/credit_sequences.parquet \
  --output models/bust_out/v1/model.joblib

Testing

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_synthetic_scorer.py -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

Documentation

See CLAUDE.md for detailed architecture documentation, API reference, and configuration options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Identity Fraud Detection System

Features

Quick Start

Prerequisites

Installation

Start Infrastructure

Run the API

API Usage

Project Structure

Training Models

Testing

Documentation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
docker		docker
models		models
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

DrewWasem/fraud-detection-system

Folders and files

Latest commit

History

Repository files navigation

Synthetic Identity Fraud Detection System

Features

Quick Start

Prerequisites

Installation

Start Infrastructure

Run the API

API Usage

Project Structure

Training Models

Testing

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages