This repository contains a Jupyter Notebook implementation of linear regression, including data preprocessing, exploratory data analysis (EDA), outlier detection, visualization, and model construction using the Housing dataset.
The project focuses on understanding the complete linear regression workflow and data preparation steps rather than relying entirely on high-level machine learning libraries.
- Dataset: Housing Dataset
- Target Variable:
MEDV(Median value of owner-occupied homes) - Missing values are handled by removing incomplete rows.
- Outliers are detected and filtered using the Interquartile Range (IQR) method.
The notebook follows these main steps:
-
Data Loading and Inspection
- Reading the dataset
- Checking data types and missing values
-
Exploratory Data Analysis (EDA)
- Summary statistics
- Feature inspection
- Correlation analysis
-
Outlier Detection and Removal
- IQR-based method applied to numerical features
-
Data Visualization
- Correlation heatmaps
- Feature relationships
-
Linear Regression Implementation
- Step-by-step implementation
- Model understanding and evaluation
- Python
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Jupyter Notebook
- Clone the repository:
git clone https://github.com/Veynorex/Linear-Regression-From-Scratch.git
- Open the notebook (if you are using vscode, then open with vscode, it has its own jupyter notebook extension)
jupyter notebook LinearRegression.ipynb
This implementation focuses on understanding the mechanics of linear regression, data preprocessing, and exploratory data analysis.
For simplicity and educational purposes, the model is trained on the entire dataset, and no train–test split is performed.
As a result:
- Evaluation metrics may be optimistic
- The emphasis is on conceptual clarity rather than generalization