🚗 Auto MPG Regression – Machine Learning Project

This project focuses on predicting the fuel efficiency (MPG - Miles Per Gallon) of automobiles using their technical specifications. It applies various data preprocessing techniques, regression algorithms, and ensemble methods to develop accurate and robust predictive models.

📂 Dataset

Source: UCI Machine Learning Repository – Auto MPG Dataset
Total Instances: 398
Features Used:
- Cylinders
- Displacement
- Horsepower
- Weight
- Acceleration
- Model Year
- Origin
Target Variable: MPG (Miles Per Gallon)

📊 Workflow Overview

Data Cleaning
- Handled 12 missing values in the Horsepower column using group-wise median imputation (based on Cylinder count).
Exploratory Data Analysis (EDA)
- Correlation matrix, clustermap, skewness analysis, and distribution plots.
Outlier Removal
- Used IQR method on Horsepower and Acceleration.
Feature Engineering
- Log transformation applied to target (MPG) due to skewness.
- One-hot encoding on Cylinders and Origin.
Data Splitting & Scaling
- 10% training / 90% testing split to challenge the models.
- Applied RobustScaler to minimize the influence of outliers.
Modeling
- Applied and tuned:
  - Linear Regression
  - Ridge Regression (L2)
  - Lasso Regression (L1)
  - ElasticNet (L1 + L2)
  - XGBoost Regressor
  - Averaging Ensemble (Lasso + XGBoost)
Performance Evaluation
- Mean Squared Error (MSE) used as the evaluation metric.

🧠 Model Comparison (MSE on Test Set)

Model	MSE	Notes
Linear Regression	0.01363	Baseline model
Ridge Regression	0.01340	Slightly improved with L2 regularization
Lasso Regression	0.01331 ✅	Feature selection + low error
ElasticNet	0.01330 ✅✅	Best performance, balanced model
XGBoost	0.01810 ❌	Overcomplicated for this small dataset
Averaged (Lasso + XGB)	0.01365	Balanced but not outperforming Lasso/ENet

🧪 Key Insights

ElasticNet and Lasso were the most effective models for this dataset.
RobustScaler proved useful in stabilizing model performance.
XGBoost, while powerful, underperformed due to dataset simplicity.
Averaging models yielded consistent but not superior results.

📦 Technologies Used

Python 3.10+
NumPy, Pandas
Seaborn, Matplotlib
Scikit-learn
XGBoost
Jupyter Notebook / VSCode

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
README.md		README.md
auto-mpg.data		auto-mpg.data
data.ipynb		data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚗 Auto MPG Regression – Machine Learning Project

📂 Dataset

📊 Workflow Overview

🧠 Model Comparison (MSE on Test Set)

🧪 Key Insights

📦 Technologies Used

📁 Project Structure

About

Uh oh!

Releases

Packages

Languages

hakaninki/mpg-regression-project

Folders and files

Latest commit

History

Repository files navigation

🚗 Auto MPG Regression – Machine Learning Project

📂 Dataset

📊 Workflow Overview

🧠 Model Comparison (MSE on Test Set)

🧪 Key Insights

📦 Technologies Used

📁 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages