This repository contains machine learning and data preprocessing projects created for different purposes, including internships and hackathons. Below is a detailed overview of each project:
Purpose: Developed for an internship at ADVERK Technologies, this notebook focuses on predicting the presence of breast cancer using Naive Bayes Classifier.
- Libraries Used:
numpy,pandas,seaborn,matplotlib.pyplotsklearnmodules:preprocessing,model_selection,metrics,naive_bayes
- Workflow:
- Load breast cancer data using
datasets. - Preprocess numerical features.
- Split the dataset into training and testing subsets.
- Apply
RobustScalerfor scaling. - Train and evaluate a Gaussian Naive Bayes classifier.
- Load breast cancer data using
Purpose: Developed for the Indian Government's "Dark Patterns Buster Hackathon," this notebook predicts the usability or validation of a product based on various attributes.
- Libraries Used:
pandas,category_encoders,joblib,graphviz,picklesklearnmodules:ensemble,metrics,model_selection,tree
- Workflow:
- Load and preprocess product data, filling missing values with appropriate defaults.
- Encode categorical features using
OneHotEncoder. - Train a Random Forest Classifier.
- Test the model and evaluate its accuracy.
- Save the trained model using
pickleandjoblibfor reuse.
Purpose: Developed for an internship at ADVERK Technologies, this project aims to predict the likelihood of a stroke based on patient data using Decision Tree.
- Libraries Used:
pandas,numpy,category_encoders,graphvizsklearnmodules:model_selection,metrics,tree
- Workflow:
- Load healthcare dataset.
- Handle missing values and preprocess categorical data using
OrdinalEncoder. - Train a Decision Tree Classifier with Gini Impurity.
- Evaluate the model and visualize decision trees using
graphviz.
Breast cancer.ipynb: Notebook for breast cancer prediction.ProductconfirmerML.ipynb: Notebook for product confirmation model.Strokefinding.ipynb: Notebook for stroke prediction.
Install the necessary libraries using the following command: For Windows Command:
pip install numpy pandas matplotlib seaborn scikit-learn category_encoders graphviz joblibFor Jupyter Notebook:
!pip install numpy pandas matplotlib seaborn scikit-learn category_encoders graphviz joblib- Clone this repository:
git clone <repository-url>
- Open the desired notebook using Jupyter Notebook or JupyterLab:
jupyter notebook <notebook-name>.ipynb
- Follow the workflow described within each notebook.
- ADVERK Technologies: For providing the opportunity to work on real-world machine learning problems.
- Indian Government: For organizing the "Dark Patterns Buster Hackathon," inspiring innovative solutions like Product Confirmer ML.
Feel free to explore and modify the projects as needed. Contributions are welcome!