Data analysis case study using Spreadsheets and Tableau.
- ✔️ Data Cleaning
- ✔️ Analysis
- ✔️ Visualizations
- ✔️ Final README
This dataset was synthetically generated and does not reflect real-world data. It is meant for exploratory analysis only. Station names, coordinates, and measurements are entirely fictional and were algorithmically randomized within realistic boundaries.
China Water Pollution Monitoring Dataset
Analyze water quality data from 2023 to identify which regions in China experienced the highest pollution levels, detect seasonal trends in pollution, and highlight which pollutants were most frequently elevated.
- Geographic map showcasing monitoring stations and their average levels of pollution
- Heat map exploring seasonal trends among provinces
- Bar graph revealing which pollutants had the highest instances of reaching unsafe levels of contamination
- Column chart comparing which stations within specific provinces had the highest averages
- Most polluted provinces in china included Wuhan, Dali, Yichang, Luoyang, and Zhengzhou
- Spring and Winter tended to have higher levels of pollution
- Phosphorous, Ammonia, And Total Nitrogen were the biggest contributors regarding pollution.
- Stations in Zhengzhou and Yichang had high levels of Phosphorous throughout the entire year.
Google Sheets: Data cleaning through formulas, filters, and helper columns
Google Slides: Thorough explanations and findings
Tableau: Interactive visualizations
Created a copy of the raw dataset and cleaned it using Google Sheets. This process consisted of...
- Filtering out empty/irrelevant rows
- Renaming columns for clarity
- Creating helper columns for thresholds and regional validation
- Removing invalid data entries
This process enabled me to work with clean, structured data so I could focus my attention to answering my business task.