Skip to content

Nithya0710/ML-Mini_Project

Repository files navigation

ML Mini Project

Clustering Drivers and Vehicles Based on Accident Risk Patterns

Overview

This project utilizes unsupervised machine learning (K-Means Clustering) on a Road Traffic Accident dataset to identify distinct patterns within the data. The primary goal is to segment accidents into meaningful clusters and statistically validate if these clusters correlate with the Accident Severity (Slight, Serious, or Fatal Injury). This provides critical insights for understanding and mitigating high-risk accident scenarios.


Objective

Cluster drivers and vehicles to discover patterns of high accident risk, helping in road safety analysis and preventive policy design.


Methodology

  1. Data Preprocessing

    • Target Mapping: The nominal Accident_severity feature was mapped to an ordinal scale (1 to 3, where 3 is Fatal Injury) for validation purposes.
    • Cyclical Encoding: Features like Time and Day_of_week were transformed using Sine/Cosine encoding to ensure the clustering algorithm correctly interprets their continuous cyclical nature.
  2. Preprocessing Pipeline

    • Standard Scaling was applied to numerical features.
    • One-Hot Encoding was applied to categorical features.
  3. Clustering Model

    • Model: K-Means Clustering
    • Parameter: The number of clusters (k) was set to 2.

Setup & Running the Project

To set up and run this project, you will need a Python environment with the necessary libraries.

1. Prerequisites

  • Ensure you have Python 3.8+ installed.
  • Install the required libraries using:
pip install pandas numpy scikit-learn matplotlib

2. Running the Project

  • Place your dataset as RTA Dataset.csv in the same directory as the notebook.
  • The entire project workflow is contained within a single Notebook:
    1. Open the notebook: ML_Mini_Proj_CS413_CS399.ipynb.
    2. Ensure the data file path (as mentioned above) is correct.
    3. Run all cells sequentially. The notebook will perform data cleaning, preprocessing, K-Means clustering, statistical validation (Chi-Squared Test), and generate visualizations (PCA plot and Average Severity Bar Chart).

3. Output:

  • CS399_CS413_average_severity_per_cluster.png → Average Accident Severity per Cluster
  • PCA scatter plot & bar chart of accident rate per cluster

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors