GitHub - AmirKamy/First-data-practice: This project focuses on analyzing and classifying the Pima Indians Diabetes dataset to predict the onset of diabetes within 5 years. The dataset contains 768 samples with 8 medical features and a binary class label. Tasks include histogram visualization, statistical calculations, scatter plot creation, and decision tree modeling.

Pima Indians Diabetes Prediction

This project analyzes the Pima Indians Diabetes dataset to predict the onset of diabetes in individuals within 5 years based on various medical parameters. The problem is framed as a binary classification task, where the model predicts whether a person will develop diabetes (1) or not (0).

Dataset Overview

The dataset consists of 768 records, each with 8 medical features and 1 output variable indicating diabetes status. Some of the medical features include:

Number of times pregnant
Plasma glucose concentration (after a 2-hour oral glucose tolerance test)
Diastolic blood pressure (mm Hg)
Triceps skinfold thickness (mm)
2-hour serum insulin (mu U/ml)
Body mass index (BMI) (weight in kg / (height in m)^2)
Diabetes pedigree function (family history)
Age (years)

The missing values in the dataset are coded as zero.

Key Tasks

Visualizing the Data:
- Histograms for all features without considering the class variable.
- Histograms of features 2, 3, 6, and 7, with separate plots for the two classes (diabetes or not).
Statistical Analysis:
- Calculation of the mean and variance for feature 6 (BMI).
Scatter Plot:
- Plotting a scatter plot between features 6 (BMI) and 8 (Age), with different colors representing the two classes.
Classification:
- Building a decision tree classifier with a maximum depth of 4 to classify whether an individual has diabetes or not based on the provided features.

Conclusion

This project provides a comprehensive analysis of the Pima Indians Diabetes dataset, combining data visualization, statistical analysis, and machine learning techniques to predict diabetes onset.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
data.ipynb		data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pima Indians Diabetes Prediction

Dataset Overview

Key Tasks

Conclusion

About

Uh oh!

Releases

Packages

Languages

AmirKamy/First-data-practice

Folders and files

Latest commit

History

Repository files navigation

Pima Indians Diabetes Prediction

Dataset Overview

Key Tasks

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages