Fraud Detection System

End-to-end ML pipeline for credit card fraud detection — from data ingestion to real-time API serving with explainability, A/B testing, and monitoring dashboard.

Overview

End-to-end machine-learning system for detecting fraudulent credit card transactions. The project covers the full ML lifecycle: data ingestion, feature engineering, model training with hyperparameter tuning, real-time inference via a REST API, A/B testing of model variants, a live monitoring dashboard, and containerised deployment.

Features

Data pipeline — automated download, validation, and feature engineering for the Kaggle Credit Card Fraud dataset
Model training — Logistic Regression, Random Forest, and XGBoost with cross-validation, class-imbalance handling, and Optuna hyperparameter tuning
Model registry — versioned model storage with metadata (joblib + JSON)
Explainability — SHAP-based per-prediction and global feature importance
REST API — FastAPI endpoints for single and batch prediction with confidence scores
Streaming simulator — async transaction stream with configurable fraud injection rate, launchable directly from the dashboard sidebar
A/B testing — deterministic traffic splitting with chi-squared significance testing
Monitoring dashboard — Streamlit UI with real-time feed, performance charts, alerts, and A/B comparison
Docker Compose — multi-service deployment (API, dashboard, simulator) with health checks
CI/CD — GitHub Actions pipeline with linting, testing, coverage, and Docker build verification

Architecture

graph LR
    subgraph Data Pipeline
        A[Kaggle Dataset] --> B[Downloader]
        B --> C[Preprocessor]
        C --> D[Feature Store]
    end

    subgraph Model Layer
        D --> E[Trainer]
        E --> F[Evaluator]
        F --> G[Registry]
        E --> H[Explainer]
    end

    subgraph Inference
        G --> I[FastAPI]
        I --> J["predict"]
        I --> K["predict/batch"]
        I --> L["ab-test/results"]
        I --> M["health"]
    end

    subgraph Streaming
        N[Simulator] -->|queue| O[Consumer]
        O --> P[A/B Router]
        P --> I
    end

    subgraph Monitoring
        Q[Streamlit Dashboard]
        Q --> R[(SQLite)]
        Q -->|start/stop| N
        I --> R
    end

Quick Start

1. Install

git clone git@github.com:KarasiewiczStephane/fraud-detection-system.git
cd fraud-detection-system
python -m venv .venv && source .venv/bin/activate
make install          # pip install -r requirements.txt

Requires Python 3.11+. A 1 000-row sample dataset ships in data/sample/sample_transactions.csv, so no separate download step is needed.

Full Kaggle dataset (optional): to download the complete 284 807-row dataset, use the DatasetDownloader class in src/data/downloader.py. It tries kagglehub first, then falls back to a direct URL:
from src.data.downloader import DatasetDownloader
DatasetDownloader().download("data/raw")

2. Start the API

make run              # uvicorn src.api.app:app --host 0.0.0.0 --port 8000

The API starts at http://localhost:8000. On first launch it auto-trains a RandomForest on the sample dataset — no manual training step required. It also initialises the SQLite database, sets up the SHAP explainer, and enables A/B testing.

3. Launch the Dashboard

In a second terminal (with the venv activated):

make dashboard        # streamlit run src/dashboard/app.py

Opens at http://localhost:8501. Use the Simulator controls in the sidebar to start/stop the transaction stream directly from the dashboard — no extra terminal needed. Adjust the transactions-per-second rate with the slider before starting.

Alternatively, run the simulator manually in a separate terminal:

python -m src.streaming.run_simulator

Simulator environment variables:

Variable	Default	Description
`API_URL`	`http://localhost:8000`	API base URL
`STREAM_RATE`	`10`	Transactions per second
`DATA_PATH`	`data/sample/sample_transactions.csv`	Source data

The dashboard has six pages:

Page	What it shows
Overview	Transaction counts (1h/24h/7d), fraud count, fraud rate
Real-time Feed	Latest 50 predictions with colour-coded fraud probability
Model Performance	Fraud rate over time, prediction distribution, date filter
A/B Test Results	Side-by-side model comparison, significance indicator
Feature Importance	SHAP summary and per-prediction feature attributions
Alert Log	High-confidence fraud alerts with CSV download

Running with Docker Compose

# Build and start all three services
make docker-up

# API:       http://localhost:8000
# Dashboard: http://localhost:8501

# View logs
make docker-logs

# Stop
make docker-down

Tests and Linting

make test          # Run tests with coverage
make lint          # Ruff check + format
pre-commit run -a  # Lint + format + tests

API Examples

Health Check

curl http://localhost:8000/health

{
  "status": "healthy",
  "model_version": "rf_default"
}

Single Prediction

curl -X POST http://localhost:8000/api/v1/predict \
  -H "Content-Type: application/json" \
  -d '{
    "transaction_id": "txn_001",
    "Time": 0.0,
    "V1": -1.36, "V2": -0.07, "V3": 2.54, "V4": 1.38,
    "V5": -0.34, "V6": -0.47, "V7": 0.24, "V8": 0.10,
    "V9": 0.36, "V10": 0.09, "V11": -0.55, "V12": -0.62,
    "V13": -0.99, "V14": -0.31, "V15": 1.47, "V16": -0.47,
    "V17": 0.21, "V18": 0.03, "V19": 0.40, "V20": 0.25,
    "V21": -0.02, "V22": -0.39, "V23": -0.11, "V24": -0.22,
    "V25": -0.64, "V26": 0.72, "V27": -0.22, "V28": 0.03,
    "Amount": 149.62
  }'

{
  "transaction_id": "txn_001",
  "fraud_probability": 0.023,
  "is_fraud": false,
  "confidence": 0.977,
  "model_version": "rf_default",
  "explanation": null
}

Prediction with SHAP Explanation

Add ?include_explanation=true to get per-feature SHAP attributions:

curl -X POST "http://localhost:8000/api/v1/predict?include_explanation=true" \
  -H "Content-Type: application/json" \
  -d '{
    "transaction_id": "txn_002",
    "Time": 0.0,
    "V1": -1.36, "V2": -0.07, "V3": 2.54, "V4": 1.38,
    "V5": -0.34, "V6": -0.47, "V7": 0.24, "V8": 0.10,
    "V9": 0.36, "V10": 0.09, "V11": -0.55, "V12": -0.62,
    "V13": -0.99, "V14": -0.31, "V15": 1.47, "V16": -0.47,
    "V17": 0.21, "V18": 0.03, "V19": 0.40, "V20": 0.25,
    "V21": -0.02, "V22": -0.39, "V23": -0.11, "V24": -0.22,
    "V25": -0.64, "V26": 0.72, "V27": -0.22, "V28": 0.03,
    "Amount": 149.62
  }'

The explanation field will contain base_value, prediction, and ranked contributions showing each feature's impact on the fraud score.

Batch Prediction (up to 100 transactions)

curl -X POST http://localhost:8000/api/v1/predict/batch \
  -H "Content-Type: application/json" \
  -d '{"transactions": [{"transaction_id": "txn_001", "Time": 0, "V1": 0, "V2": 0, "V3": 0, "V4": 0, "V5": 0, "V6": 0, "V7": 0, "V8": 0, "V9": 0, "V10": 0, "V11": 0, "V12": 0, "V13": 0, "V14": 0, "V15": 0, "V16": 0, "V17": 0, "V18": 0, "V19": 0, "V20": 0, "V21": 0, "V22": 0, "V23": 0, "V24": 0, "V25": 0, "V26": 0, "V27": 0, "V28": 0, "Amount": 50.0}]}'

A/B Test Results

curl http://localhost:8000/api/v1/ab-test/results

Interactive API Docs

FastAPI serves auto-generated docs at http://localhost:8000/docs (Swagger UI) and http://localhost:8000/redoc (ReDoc).

Configuration

All config files live in configs/:

File	Purpose
`config.yaml`	Data paths, API host/port, database path, streaming rate
`model_params.yaml`	Hyperparameters for each model type
`ab_test.yaml`	A/B test toggle, model variants, traffic split ratio

Environment variables override YAML values using dot-path notation (e.g. MODEL__DEFAULT_MODEL=random_forest).

Project Structure

fraud-detection-system/
├── src/
│   ├── data/               # Data ingestion and feature engineering
│   │   ├── downloader.py   #   Dataset download and validation
│   │   ├── preprocessor.py #   Feature engineering pipeline
│   │   └── feature_store.py#   Versioned Parquet feature storage
│   ├── models/             # Training, evaluation, and serving
│   │   ├── trainer.py      #   Model training with cross-validation
│   │   ├── evaluator.py    #   Evaluation and comparison reports
│   │   ├── registry.py     #   Versioned model registry
│   │   └── explainer.py    #   SHAP-based explainability
│   ├── api/                # FastAPI inference service
│   │   ├── app.py          #   Application entry point
│   │   ├── schemas.py      #   Pydantic request/response models
│   │   └── routes/         #   Endpoint handlers
│   ├── streaming/          # Real-time transaction processing
│   │   ├── simulator.py    #   Async transaction generator
│   │   ├── consumer.py     #   Stream consumer
│   │   ├── ab_router.py    #   A/B traffic routing
│   │   └── run_simulator.py#   Container entry point
│   ├── dashboard/          # Streamlit monitoring UI
│   │   ├── app.py          #   Dashboard entry point + simulator controls
│   │   ├── data.py         #   Sync SQLite data access
│   │   └── _pages/         #   Dashboard pages (6 views)
│   └── utils/              # Shared utilities
│       ├── config.py       #   YAML config with env overrides
│       ├── logger.py       #   JSON structured logging
│       └── database.py     #   Async SQLite manager
├── tests/                  # pytest test suite (600+ tests)
├── configs/                # YAML configuration files
├── data/sample/            # 1000-row sample dataset for CI
├── Dockerfile              # API container (multi-stage)
├── Dockerfile.dashboard    # Dashboard container
├── Dockerfile.simulator    # Simulator container
├── docker-compose.yml      # Multi-service orchestration
├── Makefile                # Build/run/dashboard shortcuts
├── requirements.txt        # Python dependencies
└── .github/workflows/ci.yml # CI/CD pipeline

Technology Stack

Category	Technology
ML	scikit-learn, XGBoost, imbalanced-learn, Optuna
Explainability	SHAP
API	FastAPI, Uvicorn, Pydantic v2
Dashboard	Streamlit, Plotly
Data	pandas, NumPy, PyArrow
Database	SQLite (aiosqlite for API, sqlite3 for dashboard)
Containerisation	Docker, Docker Compose
CI/CD	GitHub Actions, Ruff, pytest, Codecov

Author

Stéphane Karasiewicz — skarazdata.com | LinkedIn

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection System

Overview

Features

Architecture

Quick Start

1. Install

2. Start the API

3. Launch the Dashboard

Running with Docker Compose

Tests and Linting

API Examples

Health Check

Single Prediction

Prediction with SHAP Explanation

Batch Prediction (up to 100 transactions)

A/B Test Results

Interactive API Docs

Configuration

Project Structure

Technology Stack

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
configs		configs
data/sample		data/sample
docs		docs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.dashboard		Dockerfile.dashboard
Dockerfile.simulator		Dockerfile.simulator
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection System

Overview

Features

Architecture

Quick Start

1. Install

2. Start the API

3. Launch the Dashboard

Running with Docker Compose

Tests and Linting

API Examples

Health Check

Single Prediction

Prediction with SHAP Explanation

Batch Prediction (up to 100 transactions)

A/B Test Results

Interactive API Docs

Configuration

Project Structure

Technology Stack

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages