This project focuses on analyzing global layoff events across companies, industries, countries, and funding stages over recent years. The dataset captures organizational, financial, and temporal characteristics of layoffs, including company name, industry, funding stage, country, total number of employees laid off, percentage of workforce impacted, and date of the event. From a data analytics perspective, the primary goal of this project was twofold: Ensure analytical reliability through rigorous data cleaning, addressing duplicates, inconsistent categorical values, missing information, and incorrect data types. Extract meaningful patterns from layoff events, identifying which industries, geographies, company stages, and time periods were most impacted, and how layoffs relate to company scale and funding levels. The analysis is designed to support macro-level understanding of labor market shocks, industry vulnerability, and structural risk across different stages of company maturity.
This analysis examines global layoff events across companies, industries, countries, and funding stages using a rigorously cleaned SQL dataset.
- The data shows that layoffs are not evenly distributed across the economy. Instead, job losses are heavily concentrated within a small set of industries, countries, and companies. A limited number of firms account for a large share of total layoffs, often across multiple years, indicating recurring downsizing rather than one-off events.
- Technology-related sectors are disproportionately impacted, reflecting vulnerabilities associated with rapid scaling and venture-backed growth. Importantly, high funding levels and late-stage maturity do not shield companies from workforce reductions — several well-funded and post-IPO firms contribute significantly to total layoffs.
- Temporally, layoffs occur in waves, with sharp spikes in specific periods rather than gradual declines, suggesting systemic shocks rather than isolated operational adjustments.
Overall, the project demonstrates how careful SQL-based data cleaning is essential for producing reliable labor market insights and how macroeconomic stress propagates unevenly across industries, geographies, and company stages.
Before any analysis, the raw dataset was duplicated into a staging environment to preserve data integrity and allow safe transformations. All cleaning steps were performed directly in SQL to reflect real-world data warehouse workflows. If you want it to see it for yourself, this is the way: data_cleaning_raw.sql
Layoff datasets are particularly prone to duplication due to repeated reporting across sources and update cycles. To address this: A window function (ROW_NUMBER) was used to detect duplicates based on a composite key including:
- company
- location
- industry
- total laid off
- percentage laid off
- date
- company stage
- country
- funds raised
This approach ensures that only truly identical layoff events were flagged, avoiding accidental removal of legitimate multi-event layoffs from the same company. All duplicate rows were removed, leaving a one-record-per-event structure, which is critical for accurate aggregation in later analyses.
Several inconsistencies were identified and resolved to improve grouping accuracy:
- Company names were trimmed to eliminate hidden whitespace, preventing artificial fragmentation during aggregation.
- Industry values such as "Crypto*", "Crypto/Web3", etc. were standardized into a single "Crypto" category. (This prevents dilution of industry-level insights and allows accurate ranking of impacted sectors)
- Country names were cleaned to remove trailing punctuation (e.g., "United States." → "United States"), ensuring consistent geographic aggregation.
These steps directly impact analytical quality, especially for GROUP BY operations used extensively in the EDA.
The date column was originally stored as text. This was transformed into a proper DATE type using STR_TO_DATE, enabling:
- Time-based aggregation (year, month)
- Chronological ordering
- Rolling and cumulative calculations
This transformation is essential for any serious temporal analysis and enables all subsequent trend evaluations.
Industry values missing or stored as empty strings were addressed using self-joins, imputing the industry when the same company appeared elsewhere with valid information. This approach is analytically justified because:
- Industry is a company-level attribute, not event-specific
- It preserves data volume while minimizing information loss
- Rows with no layoff magnitude information at all (both total_laid_off and percentage_laid_off null) were removed, as they provide no analytical value and could distort aggregates.
After cleaning, the dataset represents a high-integrity event-level table, where each row corresponds to a confirmed layoff event with standardized categorical values, valid dates, and meaningful numeric metrics. This structure supports reliable analysis across:
- Time
- Company
- Industry
- Geography
- Company maturity and funding level
(If you want it to see it for yourself: EDA_sql_company.sql)
By inspecting maximum values:
- Some companies laid off 100% of their workforce (percentage_laid_off = 1), indicating complete shutdowns rather than restructuring.
- Among these cases, several had substantial funding, revealing that high capital availability does not guarantee organizational survival.
- This challenges the assumption that funding alone mitigates operational risk.
Aggregating total layoffs by company shows that:
- A small number of large companies account for a disproportionate share of total layoffs.
- These firms often appear across multiple years, indicating repeated workforce contractions rather than isolated events.
- This suggests structural downsizing patterns rather than short-term corrections.
When grouping layoffs by industry:
- Layoffs are heavily concentrated in a limited set of industries, with technology-related sectors leading in total job losses.
- This confirms that macroeconomic shocks disproportionately affected industries with:
- Rapid prior growth
- High dependency on venture capital
- Aggressive scaling strategies
- Industry-level aggregation makes clear that layoffs were not evenly distributed across the economy.
Country-level analysis reveals:
- Layoffs are strongly concentrated in a few countries, with one country dominating total job losses.
- This reflects the geographic concentration of venture-backed companies and startup ecosystems rather than global labor exposure.
- The data captures where layoffs are reported, not necessarily where global labor risk is evenly spread.
Time-based analysis highlights several important patterns:
- Layoffs span multiple years, but specific years show sharp spikes in total job losses.
- Monthly aggregation reveals clear clustering, with certain periods experiencing rapid escalation in layoffs.
- Using a rolling cumulative sum shows how layoffs accelerated rather than occurred linearly, pointing to systemic shocks rather than gradual adjustment.
When grouped by company stage:
- Layoffs are not limited to early-stage startups.
- Post-IPO and late-stage companies contribute significantly to total layoffs, demonstrating that maturity does not eliminate exposure to market downturns.
- Early-stage companies, while smaller in absolute numbers, still show high vulnerability relative to their size.
- This finding reframes layoffs as a system-wide phenomenon, not a startup-only issue.
Ranking companies by layoffs within each year shows:
- Each year tends to be dominated by a small set of companies responsible for the majority of layoffs.
- This concentration suggests that headline layoffs are driven by a few extreme cases, rather than uniform downsizing across firms.
- This is a critical distinction for interpreting labor market narratives.
- Layoffs are highly concentrated by industry, geography, and company.
- Funding size does not guarantee resilience.
- Workforce reductions often occur in waves, not evenly over time.
- Late-stage and public companies play a major role in total job losses.
- A small number of firms dominate layoff statistics each year.













