This is my very first machine learning project, where I predict house prices based on input features using supervised regression models. I followed a project-based and self-learning approach to study AI. Although I'm not yet well-versed in ML theory, I aimed to create a complete and functional end-to-end workflow. ❤️
The goal of this project is to:
- Predict housing prices based on features (area, bedrooms, parking, etc.).
- Apply a real-world machine learning pipeline.
- (Optionally in the future) infer which features most impact housing prices — useful for real estate and business decisions.
- Source: Kaggle
- Format:
.zip
containing a.csv
file - Features:
- Numerical:
area
,bedrooms
,bathrooms
,stories
,parking
,price
- Categorical (binary):
mainroad
,guestroom
,basement
,hotwaterheating
,airconditioning
,prefarea
- Categorical (nominal):
furnishingstatus
(furnished/semi/unfurnished)
- Numerical:
- Data Collection & Reading
- Data Cleaning
- Removed 15 outliers using IQR
- No missing or duplicate values
- Data Preprocessing
- One-hot encoding for categorical features
- Feature scaling with
StandardScaler
- Exploratory Data Analysis (EDA)
- Distribution of
price
- Correlation and boxplots
- Multicollinearity check
- Distribution of
- Feature Engineering
- Created
sum_area = area × stories
- Created
amenities_count
= count of all "yes" binary features - One-hot encoded
stories
- Created
- Modeling
- Baseline:
LinearRegression
- Final:
XGBoostRegressor
- Baseline:
- Evaluation & Tuning
- Used RMSE, R²
- GridSearchCV and RandomSearchCV for tuning
- Visualized predicted vs actual values
- Model: XGBoostRegressor
- RMSE: 1,093,408
- R² Score: 0.653
- Python
- Pandas, NumPy, Seaborn, Matplotlib
- Scikit-learn
- XGBoost
- Google Colab
# 1. Clone the repository
git clone https://github.com/your-username/Housing-price-prediction.git
# 2. Upload the zip dataset from Kaggle
# 3. Open and run the notebook in Google Colab or Jupyter