Learn cleaning up messy data
- 6 Different Ways to Compensate for Missing Values In a Dataset
- Tutorial: Introduction to Missing Data Imputation
- Simple techniques for missing data imputation
After completing the exercises below, you should be comfortable with
- Explore dataset to figure out if the data needs cleansing
- Identify missing data
- Identify invalid data
- Cleanse data
★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus
We have the following data. What would be a good way to handle missing data? Please discuss the following choices:
- Drop the rows with missing data
- Substitute zero for missing values
- Substitute another value for missing values (if so what is that value?)
year month rainfall
2019 Jan 10
2019 Feb 12
2019 Mar ?
2019 Apr 20
2019 May ?
How will you handle missing data in this dataset?
Person Height_cm
A 180
B ?
C 172
D 155
E 160
F ?
- Start with this notebook
- Complete the TODO items
- Read the house-sales-simplified.csv.
- Identify columns that need data cleanup
Hint :Zipcode - Convert
SaleDateto actual date type - Do a barplot of houses sold per year
- What percentage of data is clen?