Skip to content

nick-kann/TrafficCollisionML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🚘 Traffic Collision Severity Prediction 🚘

🌍 Overview

Traffic collisions are increasingly becoming more common and are already a leading cause of death. While there have been other studies on this topic, most use post-collision factors, so the proactive prediction of collision severity remains relatively unexplored. This project aims to fill this gap by developing a machine learning model capable of predicting the severity of a traffic collision using only pre-collision variables.

I also wrote a research paper for this which dives deeper into my process in creating the model and its results/implications.
I published the paper on Medium.

📚 Data

The data is from the California Highway Patrol's Statewide Integrated Traffic Records System (CHP SWITRS), through Alex Gude's Kaggle repository.

The chosen features were:

  • longitude & latitude
  • intersection (binary)
  • road surface
  • lighting
  • state highway (binary)
  • weather
  • collision time & date

🛠️ Feature Transformation

  • Categorical variables, namely 'weather_1', 'weather_2', 'lighting', and 'road_surface', were one-hot encoded.

  • The 'collision_severity' variable was binary-encoded, with values of 'property damage only', 'pain', and 'other injury' assigned as 0, while 'severe injury' and 'fatal' were designated as 1.

  • A binary 'weekend' feature was created from analyzing the 'collision_date' attribute.

  • Addressing the cyclic nature of time-based features, such as 'collision_time' and 'collision_date', involved a distinct approach. Instead of conventional one-hot encoding, I utilized sine and cosine functions to ensure that all timepoints and dates were uniformly distributed in the transformed feature space. This cyclical approach was taken from Satyam Kumar.

📊 Results

The table below displays the performance metrics of the various machine learning models used in predicting traffic collision severity.
Model Accuracy Precision Recall F1 Score
Decision Tree 0.6217 0.0480 0.5179 0.0879
Random Forest 0.6989 0.0619 0.5339 0.1109
Naive Bayes 0.8539 0.0636 0.2297 0.0997
K-Nearest Neighbors 0.6831 0.0556 0.5007 0.1001
Gradient Boosting 0.7406 0.0681 0.5020 0.1199
XGBoost 0.7209 0.0697 0.5618 0.1241
LightGBM 0.6199 0.0608 0.6786 0.1116

Although Naive Bayes had the best accuracy by a long shot, recall is more important in this case since recall is essentially the accuracy of predicting positive (severe) cases, and over-preparation in scenarios where predicting severe outcomes is crucial. So, LightGBM performed the best with a recall of 67.86%.

The table below displays the classification report for the LightGBM model.
Precision Recall F1-Score Support
Non-Severe 0.98 0.62 0.76 20647
Severe 0.06 0.68 0.11 753
Accuracy 0.62 21400
Macro Avg 0.52 0.65 0.43 21400
Weighted Avg 0.95 0.62 0.74 21400

🚀 Usage

To use this model, you can access the Jupyter Notebook file in the following link:

Link to colab

Feel free to explore and experiment with different algorithms, architectures, hyperparameters, and techniques to enhance the model's performance.

💡 Feedback

Although the model's performance is far from perfect and obviously would not be a reliable system for making critical decisions, it serves as a valuable starting point for understanding patterns and trends in the data. This analysis opens the door for iterative improvements, and your input can contribute significantly to refining the model.

For any suggestions or questions, please email me at nicholas.kann@gmail.com. Your feedback is greatly appreciated!

🧑‍💻 Author

Nicholas Kann / @butter-my-toast

About

This project delves into traffic collisions in California, employing several different machine learning algorithms to try to predict the severity of a traffic collision using pre-collision variables.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors