A collection of hands-on Machine Learning lab notebooks covering core ML concepts, trained on real-world datasets. Implementations are done both from scratch and using scikit-learn. Completed as part of a 5th semester Machine Learning course.
| Notebook | Topics Covered |
|---|---|
data_preprocessing.py |
Handling missing values, encoding categorical features, data cleaning |
Linear_Regression.ipynb |
Simple & multiple linear regression from scratch, RMSE, R² score |
Linear_Regression_Sklearn.ipynb |
Linear regression using scikit-learn, feature scaling, evaluation |
Logistic_Polynomial_Regression.ipynb |
Logistic regression (scratch + sklearn), polynomial regression |
KNN_Distance_Metrics.ipynb |
K-Nearest Neighbors with different distance metrics (Euclidean, Manhattan) |
KNN_SVM_Ensemble.ipynb |
KNN vs SVM comparison, ensemble methods (Bagging, Boosting, XGBoost) |
K_Means_Clustering.ipynb |
K-Means clustering from scratch, elbow method, DBSCAN |
KMeans_PCA.ipynb |
Dimensionality reduction with PCA, visualizing clusters |
Descision_Trees.ipynb |
Decision Tree classifier with entropy criterion, confusion matrix, accuracy score & tree visualization |
These labs use real-world datasets to train and evaluate models — not just toy data:
| Dataset | Used For |
|---|---|
| 1000 Companies | Multiple Linear Regression (profit prediction) |
| Car Price Prediction | Regression (predicting used car prices) |
| House / Admission Predict | Linear Regression (admission chance prediction) |
| Head & Brain | Simple Linear Regression (head size vs brain weight) |
| Medical Cost Personal | Regression (insurance cost prediction) |
| Mall Customers | K-Means Clustering (customer segmentation) |
| Telco Customer Churn | Classification (churn prediction) |
| Income Evaluation | SVM classification |
| Social Network Ads | Logistic Regression, KNN, SVM |
| Mushroom Dataset | KNN & Ensemble classification |
| COVID-19 Dataset | Data analysis & visualization |
| Iris | Clustering, PCA visualization |
| Bill Authentication | Descision Trees |
- Python 3
- NumPy — numerical computations
- Pandas — data manipulation
- Matplotlib / Seaborn — data visualization
- Scikit-learn — ML models and evaluation
- Supervised Learning — Linear Regression, Logistic Regression, Polynomial Regression, KNN, SVM, Descision Trees
- Unsupervised Learning — K-Means Clustering, DBSCAN
- Dimensionality Reduction — Principal Component Analysis (PCA)
- Ensemble Methods — Random Forests, AdaBoost, Gradient Boosting, XGBoost
- Model Evaluation — RMSE, R², Accuracy, Confusion Matrix, ROC Curve, K-Fold Cross Validation
- Data Preprocessing — Missing values, categorical encoding, feature scaling
-
Clone the repository:
git clone https://github.com/Zimal-Fatemah/Machine-Learning-Labs.git cd Machine-Learning-Labs -
Install dependencies:
pip install numpy pandas matplotlib seaborn scikit-learn
-
Open any notebook:
jupyter notebook
Notebooks can also be opened directly in Google Colab.
- Implementations are done both from scratch (using NumPy) and using scikit-learn, to build a solid understanding of the underlying math.
- Each notebook includes data loading, preprocessing, model training, and evaluation steps.
Zimal Fatemah
BS Artificial Intelligence
GitHub