Skip to content

nathanslee/Stroke-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Stroke Predictor - Machine Learning Health Analytics

A streamlined predictive modeling project built to analyze health-related datasets and identify stroke risk factors using Python and statistical learning techniques.
Developed as part of PolyX University Research (Dec–Apr 2024) under the guidance of faculty mentors and presented at the University Research Fair.

Python Pandas NumPy Matplotlib Machine Learning Project Type


🎓 Research Context

Lead Researcher: PolyX University Research Initiative
Duration: December 2023 – April 2024

Led a collaborative research project analyzing large-scale flight pricing datasets to identify cost-effective booking times by season, day, and time.
Developed a predictive model using Python, Pandas, and NumPy; presented findings through Matplotlib visualizations and comprehensive statistical reports.
The project was showcased at the University Project Fair, earning faculty commendations for analytical depth and presentation clarity.


⚙️ Features

🧩 Data Analytics & Modeling

  • Data Cleaning & Preprocessing: Removal of outliers, normalization, and categorical encoding
  • Feature Engineering: Extraction of relevant predictors to enhance model performance
  • Modeling Techniques: Logistic Regression, Random Forest, and Support Vector Machines
  • Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, ROC Curve visualization

📊 Visualization Tools

  • Statistical Charts: Distribution plots and feature correlation heatmaps
  • Performance Graphs: ROC and precision-recall curves for comparative analysis
  • Exploratory Dashboards: Interactive views using Matplotlib and Seaborn

🧠 Machine Learning Insights

  • Predictive identification of high-risk individuals based on clinical and behavioral factors
  • Statistical interpretation of model weights for actionable insights
  • Scalable workflow adaptable to healthcare and public safety domains

🧰 Tech Stack

Category Tools
Languages Python
Libraries Pandas, NumPy, Matplotlib, Scikit-learn
Environment Jupyter Notebook, Google Colab
Version Control Git, GitHub
Visualization Matplotlib, Seaborn

📈 Example Output

  • Visualized correlation matrix for key features
  • ROC curve comparison between Logistic Regression and Random Forest models
  • Feature importance ranking to identify key risk predictors

🚀 How to Run

  1. Clone the repository
    git clone https://github.com/nathanslee/Stroke-Predictor.git
    cd Stroke-Predictor
    

About

A streamlined predictive modeling project built to analyze health-related datasets and identify stroke risk factors using Python and statistical learning techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages