This project implements a multivariate linear regression model from scratch using Python and NumPy, and compares it against scikit-learn implementations. This walks through the entire machine learning pipeline, including data exploration, feature scaling, gradient descent optimization, convergence analysis, prediction, and model evaluation.
This project focuses on building strong intuition for the mathematical and algorithmic foundations of linear regression:
- Multivariate linear regression hypothesis
- Feature scaling (mean normalization & standardization)
- Cost function (Mean Squared Error)
- Gradient computation (∂J/∂W, ∂J/∂b)
- Gradient descent optimization
- Learning rate and convergence behavior
- Residual analysis and error interpretation
- Model evaluation metrics (R², MSE, RMSE, MAE)
- Comparison with scikit-learn implementations
- Python
- NumPy
- Pandas
- Matplotlib & Seaborn
- scikit-learn
- Jupyter Notebook
.
├── multiple_regression.ipynb # Complete implementation and analysis
├── README.md # Project documentation
-
Clone the repository:
git clone https://github.com/<your-username>/<repo-name>.git cd <repo-name>
-
Install dependencies:
pip install numpy pandas matplotlib seaborn scikit-learn
-
Open the notebook:
jupyter notebook enhance_linearR.ipynb
-
Run the notebook cells sequentially.
- Data creation & exploration
- Visualization using pair plots and correlation heatmaps
- Feature scaling to improve gradient descent convergence
- Custom gradient descent training
- Convergence analysis using cost vs iterations
- Predictions on unseen inputs
- Comparison with scikit-learn LinearRegression & SGDRegressor
- Evaluation using standard regression metrics
- Feature scaling significantly improves gradient descent stability
- Custom implementation closely matches scikit-learn results
- Convergence curves provide insight into optimization behavior
- Residual plots help validate linear model assumptions
This project is intended for educational purposes. A license can be added later if the repository is extended or shared for reuse.
Abhi Learning-focused Machine Learning & Python projects