Skip to content

nebiyuu/fraud-detection-ml-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ Fraud Detection System

🔍 Overview

This project focuses on developing reliable fraud detection models using transactional data. It handles real-world challenges like class imbalance, adaptive fraud tactics, and the need for explainable and actionable insights. It integrates geolocation analysis and user behavior tracking to improve detection accuracy and reduce false alarms.

🎯 Motivation

Fraud is costly — financially and reputationally. Whether it’s credit card abuse or fake online transactions, the damage adds up fast. A good fraud detection system must detect threats early, minimize false positives, and adapt to new attack methods — all without annoying real users. This project builds toward that.


🚀 Project Goals

  • ✅ Clean and analyze multi-source transaction data
  • ✅ Engineer smart features from geolocation, timestamps, and user behavior
  • ✅ Train and compare models like Logistic Regression, Random Forests, and XGBoost
  • ✅ Evaluate based on precision, recall, ROC-AUC, and PR-AUC (key for imbalance)
  • ✅ Apply SHAP/LIME for model interpretability
  • ✅ Outline real-time deployment and monitoring plan

🧠 Methodology

  1. Data Understanding & Preprocessing

    • Visualize fraud distribution
    • Handle outliers, nulls, and class imbalance using SMOTE/undersampling
    • Normalize and encode as needed
  2. Feature Engineering

    • Time-based features (transaction hour, frequency gaps)
    • Geolocation mismatches (IP vs billing location)
    • Device/user behavior anomalies
  3. Modeling & Training

    • Tried: Logistic Regression, Random Forest, Gradient Boosting
    • Tuned with cross-validation and stratified splits
  4. Evaluation

    • Confusion matrix, ROC-AUC, PR-AUC
    • Class-wise precision/recall trade-offs
    • Monitored overfitting using separate validation data
  5. Explainability

    • Used SHAP values to interpret feature impact on predictions
    • Helped assess trustworthiness of automated decisions
  6. Deployment Strategy (Outline)

    • Real-time prediction API with FastAPI or Flask
    • Logging and alert system for high-risk transactions
    • Dashboard for fraud analysts (optional future work)

🧩 Key Challenges Tackled

  • 🧨 Imbalanced Classes: Addressed with resampling + PR-AUC focus
  • 🥷 Evolving Fraud Patterns: Designed models to generalize well
  • 🎯 Evaluation Bias: Tested on non-resampled test set
  • 🗺️ Geolocation Fraud: Custom logic to spot IP-region mismatches
  • ⚠️ Explainability: No black-box — we know why a transaction is flagged

📁 Folder Structure

.
├── data/
│   ├── raw/            <- Original CSV files
│   ├── processed/      <- Cleaned and transformed data
├── notebooks/          <- EDA and analysis
├── models/             <- Trained model binaries
├── scripts/            <- Preprocessing, training, evaluation scripts
├── requirements.txt    <- Python dependencies
├── README.md

🔧 Getting Started

  1. Clone the repo

    git clone https://github.com/yourusername/fraud-detection.git
    cd fraud-detection
  2. Install requirements

    pip install -r requirements.txt
  3. Place your transaction CSV files in data/raw/

  4. Run the EDA notebooks

    jupyter notebook notebooks/EDA.ipynb
  5. Run preprocessing and model training from scripts

    python scripts/preprocess.py
    python scripts/train_models.py

⚙️ Tools & Libraries

  • Python (Pandas, NumPy, Scikit-learn)
  • XGBoost, imbalanced-learn
  • SHAP, Matplotlib, Seaborn
  • Jupyter

About

fraud detection models using transactional data for a credit

Resources

Stars

Watchers

Forks

Contributors