Author: Daniel Omondi Oselu
In this Project, we are doing a Regression Analysis on a Housing dataset obtained from the King County House Sales data.
Zillow Inc has just received a new Branch manager for its NorthWestern Market. The new branch manager is a believer in using data science and analytics to help better customers' experiences, and so he has reached out to our Data Analytics Consultancy firm, Oselu Data Analytics to help find a way to help Home owners buy/sell their homes.
After several discussions with the Data Science team, we have decided that we will come up with a way to predict the potential value of their homes using the available dataset.
Photo by Breno Assis on Unsplash
This project uses the King County House Sales dataset, which can be found in kc_house_data.csv
in the data folder in this repo. The description of the column names can be found in column_names.md
in the same folder.
The Data contains 21597 data points (rows) of data on the House sales, Containing 21 attributes/Features (columns) about each house.
Model iterations are developed by methodically improving from prior models, we start by building a base model, which is the model of our target feature with the highest correlated feature.
We then iteratively add more features, while trying to settle on the best model.
From our base model, we have seen that the features that give us the best model include:
bedrooms
, bathrooms
, sqft_living
, sqft_lot
, floors
, waterfront
, view
, condition
, grade
This basically means, that when estimating the valuea house, these are the combinations of features that will ultimately give use the price of the house.
├── code
│ ├── __init__.py
|
├── data
├── images
├── __init__.py
├── README.md
├── regression_analysis_presentation.pdf
└── regression_analysis.ipynb