Skip to content

Real-time fraud detection engine with velocity checks, geolocation analysis, and device fingerprinting. Stateful detection using SQLite with explainable risk assessments.

Notifications You must be signed in to change notification settings

pelzade127/fraud-detector

Repository files navigation

Fraud Detection Engine

Real-time transaction fraud scoring with explainable risk assessment. Built for fintech applications requiring stateful fraud detection with velocity checks, geolocation analysis, and behavioral profiling.

🎯 Overview

This fraud detection engine analyzes financial transactions in real-time using rule-based pattern matching and behavioral analytics. Unlike simple threshold-based systems, it maintains user profiles, tracks device patterns, calculates geographic impossibilities using the Haversine formula, and provides transparent explainability through reason codes.

Key Features

βœ… Real-time scoring - 0-100 fraud score with 4-tier risk levels
βœ… Stateful detection - SQLite-backed transaction history and user profiling
βœ… 8 fraud rules - Velocity, amount anomalies, location patterns, device fingerprinting
βœ… Geospatial analysis - Impossible travel detection using Haversine distance
βœ… Explainability - Reason codes and severity breakdown for every assessment
βœ… REST API - Flask endpoints with webhook support for alerts
βœ… Synthetic testing - Built-in fraud scenario generator
βœ… Production-ready - Comprehensive test suite, JSON logging, input validation

πŸš€ Quick Start (macOS)

Prerequisites

# Python 3.8+
pip3 install -r requirements.txt

Run Examples

# See fraud detection in action
python3 fraud_detector.py

Output shows 4 scenarios:

  • βœ… Legitimate transaction (score: 0)
  • ⚠️ Velocity attack (score: 40)
  • ⚠️ Large first transaction (score: 50)
  • πŸ”Ά Impossible travel (score: 60)

Run Tests

python3 test_fraud_detector.py

Runs 18 comprehensive tests covering rules, scenarios, and edge cases.

Start API Server

python3 api.py

Server runs on http://localhost:5000

Generate Test Data

python3 generate_test_data.py

Creates test_scenarios.json with 6 fraud scenarios.

πŸ“Š How It Works

Architecture

Transaction Input
       ↓
[Transaction Store] ← SQLite in-memory database
       ↓
[Rules Engine] ← 8 modular fraud rules
       ↓
[Fraud Detector] ← Aggregates scores + generates assessment
       ↓
Fraud Assessment Output (score, risk level, reasons, recommendations)

Fraud Rules

Rule Description Severity Score Impact
Velocity (10min) 3+ transactions in 10 minutes High +40 pts
Velocity (60min) 10+ transactions in 1 hour Medium +25 pts
Large Transaction >3x user's average amount Medium +25 pts
New Device Transaction from unknown device Medium +20 pts
Device Velocity Device used by 5+ accounts in 1hr High +35 pts
Impossible Travel Requires >900 km/h travel speed Critical +60 pts
Round Dollar Exact $500, $1000, etc (card testing) Low +10 pts
High-Risk Category Gift cards, wire transfers, crypto Medium +15 pts

Risk Levels

  • LOW (0-25): Approve - Process normally
  • MEDIUM (26-50): Challenge - Require 2FA
  • HIGH (51-75): Review - Manual review required
  • CRITICAL (76-100): Block - Automatically decline

Feature Engineering

Transaction Store extracts:

  • User profile (lifetime spend, avg amount, known devices/locations)
  • Transaction velocity (count in time windows)
  • Device fingerprints
  • Location history

Rules Engine calculates:

  • Coefficient of variation (income stability)
  • Haversine distance (geographic movement)
  • Time-series patterns (rapid-fire detection)
  • Behavioral anomalies (deviation from norms)

πŸ› οΈ API Usage

Assess Single Transaction

curl -X POST http://localhost:5000/api/assess \
  -H "Content-Type: application/json" \
  -d '{
    "transaction_id": "txn_001",
    "user_id": "user_alice",
    "amount": 1500.00,
    "merchant": "Apple Store",
    "category": "Electronics",
    "timestamp": "2026-01-23T10:30:00Z",
    "location": {"lat": 37.7749, "lon": -122.4194},
    "device_id": "device_new_abc123",
    "ip_address": "192.168.1.100"
  }'

Response:

{
  "transaction_id": "txn_001",
  "fraud_score": 50,
  "risk_level": "medium",
  "triggered_rules": [
    "First large transaction",
    "Round dollar amount",
    "High-risk category"
  ],
  "reason_codes": [
    "First large transaction: $1500.00 transaction on new account",
    "Round dollar amount: Exact $1500.00 (common in card testing)",
    "High-risk category: Category 'Electronics' is high-risk"
  ],
  "recommended_action": "CHALLENGE: Require additional authentication (2FA, security questions).",
  "details": {
    "rules_evaluated": 8,
    "rules_triggered": 3,
    "severity_breakdown": {"low": 1, "medium": 2}
  }
}

Batch Assessment

curl -X POST http://localhost:5000/api/assess/batch \
  -H "Content-Type: application/json" \
  -d '{
    "transactions": [
      {transaction_1},
      {transaction_2},
      ...
    ]
  }'

Response includes summary:

{
  "results": [...],
  "summary": {
    "total": 100,
    "low_risk": 85,
    "medium_risk": 10,
    "high_risk": 4,
    "critical_risk": 1
  }
}

Register Webhook

curl -X POST http://localhost:5000/api/webhooks \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/fraud-alert",
    "events": ["high", "critical"]
  }'

Webhooks are triggered automatically for high-risk transactions.

Get User History

curl "http://localhost:5000/api/user/history?user_id=user_alice&time_window_minutes=60"

Returns transaction history and user profile.

πŸ“ˆ Example Scenarios

Scenario 1: Legitimate Transaction

Input:

{
  "amount": 45.50,
  "merchant": "Starbucks",
  "category": "Food & Dining"
}

Output:

  • Score: 0/100 (LOW)
  • Triggered Rules: None
  • Action: APPROVE

Scenario 2: Velocity Attack

Pattern: 5 transactions in 3 minutes

Output:

  • Score: 40/100 (MEDIUM)
  • Triggered: "Velocity: 3+ txns in 10min"
  • Action: CHALLENGE - Require 2FA

Scenario 3: Impossible Travel

Pattern: SF β†’ NYC in 30 minutes (requires 8,258 km/h)

Output:

  • Score: 60/100 (HIGH)
  • Triggered: "Impossible travel: 4129km in 0.5h"
  • Action: REVIEW - Hold for manual verification

Scenario 4: Card Testing

Pattern: 10 small transactions ($1, $5, $10) in rapid succession

Output:

  • Score: 40-50/100 (MEDIUM)
  • Triggered: Velocity + Round amounts
  • Action: CHALLENGE

Scenario 5: Device Sharing Fraud

Pattern: 7 different accounts using same device in 1 hour

Output:

  • Score: 35+/100 (MEDIUM-HIGH)
  • Triggered: "Device velocity: 5+ accounts in 60min"
  • Action: REVIEW

πŸ§ͺ Testing

Test Suite Coverage

18 comprehensive tests across 5 categories:

1. Transaction Store Tests (4 tests)

  • Adding transactions
  • Duplicate rejection
  • User profile creation
  • Time window queries

2. Fraud Rules Tests (5 tests)

  • Velocity detection
  • Large transaction detection
  • New device detection
  • Impossible travel calculation
  • Round dollar detection

3. Fraud Detector Tests (3 tests)

  • Legitimate transactions
  • High fraud scores
  • Risk level boundaries

4. Synthetic Scenarios Tests (3 tests)

  • Velocity attack detection
  • Impossible travel detection
  • Card testing patterns

5. Input Validation Tests (3 tests)

  • Invalid coordinates
  • Negative amounts
  • Missing required fields

Run Tests

python3 test_fraud_detector.py

Expected output:

Ran 18 tests in 0.013s
OK

βœ… All tests passed!

πŸ”§ Technical Implementation

New Technologies & Patterns

This project demonstrates technologies and patterns different from previous projects:

  1. SQLite with indexes - Stateful in-memory database (vs PostgreSQL in Budget Buddy/Stress Simulator)
  2. Geospatial calculations - Haversine formula for impossible travel detection
  3. Dataclasses with validation - Python type-safe models (different approach than Pydantic)
  4. JSON structured logging - Production event logging for fraud detection
  5. Time-series velocity detection - Real-time pattern matching algorithms
  6. Device fingerprinting - Security-focused identity tracking
  7. Webhook notification system - Event-driven alerting architecture
  8. Synthetic fraud scenarios - Automated test data generation for fraud patterns

Domain expertise: Real-time fraud detection and transaction security (vs consumer budgeting/planning tools)

Code Structure

fraud-detection-engine/
β”œβ”€β”€ fraud_detector.py           # Main detection engine
β”œβ”€β”€ transaction_store.py        # SQLite storage + user profiling
β”œβ”€β”€ rules_engine.py             # Modular fraud rules
β”œβ”€β”€ api.py                      # Flask REST API
β”œβ”€β”€ test_fraud_detector.py      # Comprehensive test suite
β”œβ”€β”€ generate_test_data.py       # Synthetic scenario generator
β”œβ”€β”€ test_scenarios.json         # Pre-generated test data
β”œβ”€β”€ requirements.txt            # Dependencies
β”œβ”€β”€ .gitignore                  # Git ignore file
└── README.md                   # This file

Performance Characteristics

  • Scoring Speed: <2ms per transaction
  • API Latency: ~20ms (including network)
  • Batch Processing: 500+ transactions/second
  • Memory Footprint: ~30MB (in-memory DB)
  • Database: SQLite (in-memory for speed, persistent option available)

πŸ’‘ Production Considerations

Scaling

Current (Demo):

  • In-memory SQLite
  • Synchronous processing
  • Single-threaded

Production Recommendations:

  • PostgreSQL/MySQL for persistence
  • Redis for caching + rate limiting
  • Async task queue (Celery/RabbitMQ)
  • Horizontal scaling with load balancer
  • Webhook retries with exponential backoff

Security

  • Rate limit API endpoints (100 req/min per IP)
  • Encrypt PII fields (IP addresses, device IDs)
  • Audit logs for all assessments
  • HTTPS only in production
  • API authentication (OAuth2/JWT)

Monitoring

  • Track fraud score distribution
  • Monitor false positive/negative rates
  • Alert on rule effectiveness degradation
  • Dashboard for real-time fraud activity
  • A/B test rule threshold adjustments

Compliance

  • PCI DSS: Never store card numbers
  • GDPR: Right to deletion, data minimization
  • Fair Lending: Avoid discriminatory patterns
  • Audit Trail: Log all decisions for review

🎨 Future Enhancements

  • Machine learning model (compare to rule-based)
  • Graph analysis for fraud rings
  • IP geolocation enrichment
  • Merchant category code (MCC) risk scoring
  • Time-of-day risk patterns
  • Amount clustering for anomaly detection
  • Network analysis (user connections)
  • False positive feedback loop
  • Dashboard UI for analysts
  • Prometheus metrics export

πŸ“š Use Cases

Fintech Applications

  1. Payment Processors (Stripe, Square)

    • Real-time transaction screening
    • Chargeback prevention
    • Merchant risk scoring
  2. Neobanks (Chime, Current)

    • Account takeover detection
    • P2P fraud prevention
    • New account monitoring
  3. Buy Now Pay Later (Affirm, Klarna)

    • First-party fraud detection
    • Synthetic identity detection
    • Checkout abuse prevention
  4. Crypto Exchanges (Coinbase, Kraken)

    • Withdrawal fraud prevention
    • Account verification
    • AML transaction monitoring
  5. Marketplaces (eBay, Etsy)

    • Seller fraud detection
    • Buyer protection
    • Dispute resolution

πŸ” How This Differs from Creditworthiness Scorer

Both projects show fintech risk assessment, but focus on different domains:

Feature Creditworthiness Scorer Fraud Detector
Purpose Lending decisioning Transaction security
Timing One-time (application) Real-time (every txn)
Data Historical cash flow Current + historical txns
Storage Stateless Stateful (SQLite)
Features DTI, income CV, buffer Velocity, location, device
Output Loan approval/denial Approve/challenge/block
Domain Underwriting Fraud prevention

πŸ“ License

MIT License - Free for commercial and personal use.

πŸ™ Acknowledgments

Built by Pelz as part of a fintech portfolio demonstrating:

  • Real-time risk assessment
  • Stateful pattern detection
  • Geospatial analytics
  • Production-quality code
  • Comprehensive testing

Note: This is a demonstration project for educational/portfolio purposes. For production fraud detection, consider:

  • Professional fraud services (Sift, Riskified, Stripe Radar)
  • Machine learning models trained on your data
  • Legal review for compliance
  • Insurance for fraud losses

About

Real-time fraud detection engine with velocity checks, geolocation analysis, and device fingerprinting. Stateful detection using SQLite with explainable risk assessments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages