This project focuses on cleaning and standardizing a real-world layoffs dataset using MySQL.
The goal was to transform raw, inconsistent data into a structured and analysis-ready format while preserving data integrity.
- Source: Layoffs data (raw CSV)
- Raw table:
layoffs - Cleaning tables:
layoffs_staging,layoffs_staging2
The raw data contained:
- NULL and missing values
- Inconsistent text fields
- Unstandardized date formats
- Duplicate and messy records
- MySQL
- MySQL Workbench
The following cleaning operations were performed using SQL:
- Removed duplicate records using window functions
- Standardized text fields (industry, stage, country, location)
- Converted dates into a uniform
YYYY-MM-DDformat - Handled NULL values without introducing artificial data
- Ensured correct data types for numeric columns
- Created staging tables to preserve raw data
Note: NULL values were intentionally retained where data was genuinely unavailable to avoid introducing bias.
Raw, uncleaned data with inconsistencies and missing values.
Cleaned, standardized, and analysis-ready data.
Shruti Singh

