ML Mini Project

Clustering Drivers and Vehicles Based on Accident Risk Patterns

Overview

This project utilizes unsupervised machine learning (K-Means Clustering) on a Road Traffic Accident dataset to identify distinct patterns within the data. The primary goal is to segment accidents into meaningful clusters and statistically validate if these clusters correlate with the Accident Severity (Slight, Serious, or Fatal Injury). This provides critical insights for understanding and mitigating high-risk accident scenarios.

Objective

Cluster drivers and vehicles to discover patterns of high accident risk, helping in road safety analysis and preventive policy design.

Methodology

Data Preprocessing
- Target Mapping: The nominal Accident_severity feature was mapped to an ordinal scale (1 to 3, where 3 is Fatal Injury) for validation purposes.
- Cyclical Encoding: Features like Time and Day_of_week were transformed using Sine/Cosine encoding to ensure the clustering algorithm correctly interprets their continuous cyclical nature.
Preprocessing Pipeline
- Standard Scaling was applied to numerical features.
- One-Hot Encoding was applied to categorical features.
Clustering Model
- Model: K-Means Clustering
- Parameter: The number of clusters (k) was set to 2.

Setup & Running the Project

To set up and run this project, you will need a Python environment with the necessary libraries.

1. Prerequisites

Ensure you have Python 3.8+ installed.
Install the required libraries using:

pip install pandas numpy scikit-learn matplotlib

2. Running the Project

Place your dataset as RTA Dataset.csv in the same directory as the notebook.
The entire project workflow is contained within a single Notebook:
1. Open the notebook: ML_Mini_Proj_CS413_CS399.ipynb.
2. Ensure the data file path (as mentioned above) is correct.
3. Run all cells sequentially. The notebook will perform data cleaning, preprocessing, K-Means clustering, statistical validation (Chi-Squared Test), and generate visualizations (PCA plot and Average Severity Bar Chart).

3. Output:

CS399_CS413_average_severity_per_cluster.png → Average Accident Severity per Cluster
PCA scatter plot & bar chart of accident rate per cluster

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.DS_Store		.DS_Store
CS399_CS413_ML_InferenceDoc.pdf		CS399_CS413_ML_InferenceDoc.pdf
CS399_CS413_ML_MiniProject.ipynb		CS399_CS413_ML_MiniProject.ipynb
CS399_CS413_ML_PPT.pdf		CS399_CS413_ML_PPT.pdf
CS399_CS413_RTA_Dataset.csv		CS399_CS413_RTA_Dataset.csv
CS399_CS413_average_severity_per_cluster.png		CS399_CS413_average_severity_per_cluster.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Mini Project

Clustering Drivers and Vehicles Based on Accident Risk Patterns

Overview

Objective

Methodology

Setup & Running the Project

1. Prerequisites

2. Running the Project

3. Output:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML Mini Project

Clustering Drivers and Vehicles Based on Accident Risk Patterns

Overview

Objective

Methodology

Setup & Running the Project

1. Prerequisites

2. Running the Project

3. Output:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages