This project analyzes synthetic banking transactions to detect patterns associated with money laundering. Using a Kaggle dataset, we perform exploratory data analysis and prepare the ground for classification modeling.
- Source: Kaggle β Synthetic Transaction Monitoring Dataset
- File:
SAML-D.csv(~996MB) - Key Columns:
Time,Date: Transaction timestampSender_account,Receiver_account: Pseudonymous account IDsAmount: Monetary value of transactionPayment_type: Type of transaction (e.g., CASH_OUT, TRANSFER)Is_laundering: Target flag (1 = laundering, 0 = legitimate)
- Python 3
- Google Colab
- Pandas, NumPy
- Seaborn, Matplotlib
- Scikit-learn (for metrics and modeling)
We analyze trends in:
- Frequency of laundering by
Payment_type - Amount patterns in laundering vs normal cases
- Group-wise statistical summaries
- Certain payment types (e.g., TRANSFER) have higher laundering rates.
- Laundering transactions often involve higher maximum and average amounts than regular ones.
- Open
Anti_Money_Laundering.ipynbin Google Colab. - Upload your
kaggle.jsonAPI key. - Run the cell to authenticate and download the dataset.
- Execute the EDA blocks and modify or extend as needed.
- Train machine learning models (Logistic Regression, XGBoost)
- ROC-AUC and confusion matrix evaluations
- Integrate anomaly detection
- Convert pipeline into real-time dashboard
This is a synthetic dataset, but it reflects realistic fraud detection challenges. The aim is to develop interpretable workflows for financial surveillance.
Siddharthan P S
π§ Email: sp8004@nyu.edu
π LinkedIn: Siddharthan P S
π Medium Article: Detecting Money Laundering with Python and AML Dataset
Portfolio: Siddharthan P S
To reduce RAM usage in Colab:
- Use
usecolsto load selective columns - Downcast numeric types with
pd.to_numeric(..., downcast=)