- Introduction
- Dataset Overview
- Project Objectives
- Data Cleaning
- Data Exploration and Insights
- Recommendation
- Conclusion
- Tech Stack
This project focuses on analyzing human resources data using SQL to help the HR department uncover trends, patterns, and insights that support better decision-making around employee management, hiring, diversity, and retention. The project simulates a real-world HR dataset and applies data exploration and business intelligence techniques.
The dataset used in the analysis consists of 14 columns and 22,195 rows
The dataset consists of records for employees in various departments, including their demographics, job location (remote or HQ), tenure, employment status, race, age, and more.
Employee_ID– Unique identifierFirst_Name,Last_Name– Personal infoGender,Race,Age,State– DemographicsEmployment_Status– Active/TerminatedLocation– HQ or RemoteDepartment,Hire_Date– Job-related detailsTenure_Years– Duration of employment
- Perform data cleaning to ensure consistency and reliability
- Analyze demographic distribution (gender, race, state)
- Examine employment types and work location trends
- Determine employee retention and tenure
- Track hiring patterns over time
- Identify the longest-serving employees
- Evaluate termination data and diversity factors
Using the HR Management data, your company requires you to delve into data analysis using SQL to uncover insights for HR department NOTE: Clean data if required
- What is the gender breakdown in the Company?
- How many employees work remotely for each department?
- What is the distribution of employees who work remotely and HQ
- What is the race distribution in the Company?
- What is the distribution of employee across different states?
- What is the number of employees whose employment has been terminated
- Who is/are the longest serving employee in the organization.
- Return the terminated employees by their race
- What is the age distribution in the Company?
- How have employee hire counts varied over time?
- What is the tenure distribution for each department?
- What is the average length of employment in the company?
Before analysis, the following cleaning steps were applied:
- Removed duplicate employee records (if any)
- Standardized categorical values (e.g., gender, department names)
- Converted date columns into appropriate formats for analysis
- Checked for nulls in critical columns like
Employment_Status,Department, andHire_Date - Added derived columns such as
Tenure_Yearsand age brackets where needed
- Gender distribution across the company
- Race breakdown and representation across departments
- Age group classification and its distribution
- Count of remote workers by department
- Overall HQ vs Remote employee distribution
- State-by-state employee distribution
- Total number of terminated employees
- Breakdown of terminated employees by race
- Tenure analysis by department and employment status
- Identification of the longest-serving employee(s)
- Average tenure of employees across the company
- Yearly hire trends to identify growth or attrition patterns
Based on the insights gathered:
- Diversity Monitoring: Ensure departments maintain balanced gender and racial diversity.
- Remote Work Strategy: Evaluate productivity and preferences in departments with high remote workforce ratios.
- Retention Focus: Investigate reasons for terminations in high-turnover departments.
- Age-Inclusive Policies: With a varied age distribution, adapt policies to accommodate both younger and older workers.
- Talent Acquisition: Boost hiring efforts in underrepresented states or departments with lower headcount.
This HR data analysis project demonstrates how SQL can be leveraged to surface critical HR insights. From workforce diversity to tenure trends, the project covers a wide range of factors that influence employee experience and organizational growth. By acting on these insights, companies can strengthen their HR strategies, improve employee retention, and foster a more inclusive workplace.
- SQL (MySQL / PostgreSQL)
- Git & GitHub
- MySQL Workbench
💬 Feel free to explore the queries, clone the project, or adapt the approach for your own HR data analysis tasks.

