Skip to content

programmarself/Diabetes-Prediction-Using-Classification-Method

Repository files navigation

🩺 Diabetes Prediction Using Classification Method 🩺

Machine Learning & Data-Science Project ( Final Year Project )

Built with Python 3.x β€’ Anaconda β€’ Jupyter-Lab β€’ Scikit-learn β€’ TensorFlow/Keras


FYP

🎯 Project Overview

Item Details
Goal Predict whether a patient has diabetes or not.
Approach Supervised Classification using Neural Network + Classical ML models.
Dataset Pima Indians Diabetes Dataset (768 rows Γ— 9 columns).
Tools

πŸ“Š 1. Exploratory Data Analysis (EDA)

1.1 Quick Peek πŸ‘€

import pandas as pd
df = pd.read_csv('diabetes.csv')
df.head()
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
6 148 72 35 0 33.6 0.627 50 1
1 85 66 29 0 26.6 0.351 31 0

1.2 Summary Statistics πŸ“ˆ

df.describe().T.style.bar(subset=['mean'], color='#5fba7d')
count mean std min 25% 50% 75% max
Glucose 768 120.89 31.97 0 99 117 140.25 199
BMI 768 31.99 7.88 0 27.3 32 36.6 67.1

1.3 Visual Insights πŸ“‰

pairplot Pair-plot showing correlations among features; red points are diabetic (Outcome=1).

  • Strongest predictor: Glucose levels
  • Missing values: 0 in Insulin & SkinThickness β†’ impute with median.

🧹 2. Data Pre-processing

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop('Outcome', axis=1)
y = df['Outcome']

# Impute 0's β†’ median
cols = ['Glucose','BloodPressure','SkinThickness','Insulin','BMI']
X[cols] = X[cols].replace(0, X[cols].median())

# Scale
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split 70/30
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.30, random_state=42, stratify=y)

🧠 3. Neural Network (Keras)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(16, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.2),
    Dense(8, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train,
                    validation_split=0.15,
                    epochs=100, batch_size=16, verbose=0)

loss


πŸ§ͺ 4. Classical ML Models

Model Accuracy Precision Recall F1-Score
Logistic Regression 0.794 0.75 0.64 0.69
Decision Tree 0.739 0.70 0.60 0.65
SVM (RBF) 0.792 0.76 0.62 0.68

confusion_matrix


πŸ† 5. Model Comparison & Best Pick

Metric Neural Net Logistic Decision Tree SVM
Accuracy 0.844 πŸ‘‘ 0.794 0.739 0.792
Precision 0.81 0.75 0.70 0.76
Recall 0.73 0.64 0.60 0.62
F1-Score 0.77 0.69 0.65 0.68

πŸ₯‡ Neural Network wins with 84.4 % accuracy.


πŸ’Ύ 6. Save Models for Production

# Keras model
model.save('diabetes_nn.h5')

# Sci-kit models
import joblib
joblib.dump(lr, 'diabetes_lr.pkl')
joblib.dump(dt, 'diabetes_dt.pkl')
joblib.dump(svm, 'diabetes_svm.pkl')

πŸš€ 7. Quick Usage Demo

# Load & predict
from tensorflow.keras.models import load_model
model = load_model('diabetes_nn.h5')

patient = [[6, 148, 72, 35, 0, 33.6, 0.627, 50]]
patient_scaled = scaler.transform(patient)
pred = model.predict(patient_scaled)[0][0]
print("Risk of diabetes: {:.1%}".format(pred))
# β†’ Risk of diabetes: 91.4%

πŸ“ Project Tree

πŸ“¦ Diabetes-Prediction/
 β”œβ”€ πŸ“ data/
 β”‚   └─ diabetes.csv
 β”œβ”€ πŸ“ notebooks/
 β”‚   └─ EDA.ipynb
 β”œβ”€ πŸ“ models/
 β”‚   β”œβ”€ diabetes_nn.h5
 β”‚   └─ *.pkl
 β”œβ”€ πŸ“ src/
 β”‚   β”œβ”€ train.py
 β”‚   └─ predict.py
 β”œβ”€ πŸ“„ requirements.txt
 └─ πŸ“„ README.md

πŸ“š Requirements (requirements.txt)

pandas==2.2.2
numpy==1.26.4
matplotlib==3.9.0
seaborn==0.13.2
scikit-learn==1.5.0
tensorflow==2.17.0
joblib==1.4.2

🀝 Contributing

Feel free to open issues or PRs to improve the model or add new features (e.g., SHAP explainability, Streamlit GUI).


πŸ“„ License

MIT Β© 2025 Diabetes-Prediction-Team


β€œEarly diagnosis saves lives.”

-------------------------------------------------------------------------------------------------------------------

πŸ‘¨πŸ’» By: Irfan Ullah Khan

GitHub Kaggle LinkedIn

YouTube Email Website