Skip to content

beastNico/HR-DATA-CONSOLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HR Data Console - Employee Attrition Prediction System

πŸ“‹ Project Overview

The HR Data Console is a comprehensive employee attrition prediction system that leverages machine learning to identify employees at risk of leaving the organization. This system helps HR departments proactively address retention challenges and reduce turnover costs.

🎯 Key Results

Model Performance (Optimal Threshold: 0.4200)

  • Accuracy: 60.2%
  • Precision: 26.7%
  • Recall: 85.1%
  • F1-Score: 40.6%
  • ROC-AUC: 80.1%

Financial Impact

  • Annual Replacement Cost per Employee: $39,018
  • Potential Annual Savings: $515,871
  • Net Benefit per True Positive: $27,312

πŸ”§ System Components

1. Data Processing Pipeline

  • Raw Data: IBM HR Analytics Dataset with 1,470 employees
  • Dataset Source: IBM HR Analytics Attrition Dataset on Kaggle
  • Cleaned Data: 2,266 records after preprocessing and SMOTE balancing
  • Features: 44 engineered features including demographic, job, and performance metrics

2. Machine Learning Model

  • Algorithm: Random Forest Classifier
  • Training Method: SMOTE-balanced dataset
  • Optimal Threshold: 0.4200 (maximizes recall for business needs)
  • Key Features: OverTime, StockOptions, JobSatisfaction, ManagerRelationship

3. Prediction System

  • Input: Employee data (demographics, job info, performance)
  • Output: Attrition risk score (0-1) and high-risk flag
  • Processing: Automatic feature alignment and scaling

πŸ“Š Key Findings

Top 5 Attrition Drivers

  1. OverTime_Yes: Employees working overtime are most likely to leave
  2. StockOptionLevel: Lower stock options correlate with higher attrition
  3. JobSatisfaction: Dissatisfied employees are flight risks
  4. YearsWithCurrManager: Poor manager relationships drive turnover
  5. EnvironmentSatisfaction: Negative work environment impacts retention

Business Insights

  • High-Risk Employees: 28% of workforce identified as potential attrition
  • Cost Savings: $515,871 annual savings potential through targeted retention
  • ROI: 7:1 return on retention program investments

Comprehensive Dataset Analysis

  • Total Employees: 1,470 analyzed
  • Overall Attrition Rate: 16.1% (237 employees)
  • Department Analysis: Sales has highest attrition (20.63%)
  • Job Role Analysis: Sales Representatives have 39.76% turnover
  • Experience Patterns: New hires (0-2 years) have 43.9% attrition rate

Financial Impact Breakdown

  • Total Annual Replacement Cost: $9.34M
  • Department Costs: R&D ($5.01M), Sales ($3.84M), HR ($479K)
  • Potential Savings: $3.1M with 10% attrition reduction

Critical Employee Profiles

  • High-Risk Profile: Young (<35), low-income (<$3k/month), low satisfaction
  • High-Value at Risk: 6 high-performers with low satisfaction identified
  • Gender Pay Gaps: Significant disparities in leadership roles

πŸš€ Quick Start

Prerequisites

# Create virtual environment
python -m venv hrvenv

# Activate environment (Windows)
hrvenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Generate Test Data

python scripts/create_test_data.py

Score Employees

python scripts/score_new_employee.py --input data/test/sample_employees.csv

Sample Output

Scored 50 employees
High risk employees (Score >= 0.42): 14
Output saved to: data/test/sample_employees_scored.csv

πŸ“ˆ Business Recommendations

Immediate Actions (0-3 months)

  1. Deploy Early Warning Dashboard with 0.4200 threshold
  2. Targeted Retention Programs for high-risk employees
  3. Review Compensation based on feature importance

Strategic Actions (3-12 months)

  1. Integrate predictions into performance management
  2. Develop Career Path frameworks for stagnant employees
  3. Manager Training programs to improve relationships

πŸ“Š Visualizations

Available Reports

  • Attrition Distribution: reports/attrition_distribution.png - Shows department-wise attrition rates with Sales leading at 20.63%
  • Correlation Heatmap: reports/correlation_heatmap.png - Visualizes feature relationships with color-coded correlation coefficients
  • EDA Summary: reports/eda_executive_summary.md - Comprehensive exploratory data analysis findings

Key Visualizations

  1. Attrition by Department: Sales has highest turnover rate (20.63%) vs R&D (13.84%) and HR (19.05%)
  2. Job Satisfaction vs Attrition: Strong negative correlation (-0.103) - lower satisfaction = higher attrition
  3. Overtime Impact: Employees working overtime are 3x more likely to leave (39.76% attrition for overtime workers)
  4. Experience Patterns: New hires (0-2 years) have 43.9% attrition vs 10.05% for 10+ years experience

Notebook Analysis Summaries

01_exploratory_analysis.ipynb

  • Dataset Overview: 1,470 employees, 16.1% attrition rate
  • Data Quality: 1470 missing values identified, 32/33 features complete
  • Key Insights: Sales department and entry-level employees most affected
  • Visualizations: Department-wise attrition charts, satisfaction distributions
  • Feature Engineering: Created promotion_velocity, income_experience_ratio, career_stability
  • Risk Scoring: Composite churn risk score with satisfaction, income, and experience factors
  • Statistical Analysis: Chi-square tests, correlation analysis, t-tests for significance
  • Data Preparation: SMOTE oversampling for class imbalance handling
  • Output Files: Enhanced dataset (hr_merged_for_modeling.csv), encoded dataset (hr_encoded_for_ml.csv), summary statistics

02_attrition_prediction.ipynb

  • Model Development: Random Forest with SMOTE balancing
  • Feature Engineering: 44 features including engineered metrics
  • Performance Optimization: Threshold tuning for business needs (0.4200 optimal)
  • Business Impact: $515,871 potential annual savings identified
  • Model Training: Calibrated Random Forest with 400 estimators, sigmoid calibration
  • Evaluation Metrics: ROC-AUC 80.1%, F1-score 40.6%, Recall 85.1% at optimal threshold
  • Feature Importance: Top features include OverTime, StockOptionLevel, JobSatisfaction
  • Deployment: Model persistence with joblib, scaler artifacts, feature name tracking
  • Business Insights: Cost-benefit analysis, high-risk employee identification, actionable recommendations
  • Output Files: Trained model (rf_attrition_model_v2.joblib), scaler (attrition_scaler_v2.joblib), feature list (rf_attrition_features_v2.csv)

Comprehensive SQL Analysis Findings

Department-Level Insights

  • Sales: 446 employees, 92 attrition (20.63%), $3.84M replacement cost
  • R&D: 961 employees, 133 attrition (13.84%), $5.01M replacement cost
  • HR: 63 employees, 12 attrition (19.05%), $479K replacement cost

Job Role Analysis

  • Sales Representative: 83 employees, 33 attrition (39.76%) - HIGHEST RISK
  • Laboratory Technician: 259 employees, 62 attrition (23.94%)
  • Human Resources: 52 employees, 12 attrition (23.08%)

Compensation Analysis

  • Average Monthly Income: $6,502.93 overall
  • Income Difference: $2,045.65 (stayers earn more than leavers)
  • Salary Correlation: -0.160 with attrition (higher pay = lower attrition)

Experience Patterns

  • 0-2 Years: 123 employees, 43.90% attrition - CRITICAL RISK ZONE
  • 3-5 Years: 193 employees, 19.17% attrition
  • 6-10 Years: 607 employees, 14.99% attrition
  • 10+ Years: 547 employees, 10.05% attrition

High-Risk Employee Profiles

  • Critical Characteristics: Young (<35), low-income (<$3k/month), low satisfaction
  • High-Value at Risk: 6 high-performers with low satisfaction identified
  • Gender Pay Gaps: Significant disparities in leadership roles (up to 9.99% gaps)

Business Impact Metrics

  • Total Annual Replacement Cost: $9.34M across all departments
  • Potential Savings: $3.1M with 10% attrition reduction
  • ROI Potential: 7:1 return on retention program investments
  • Monthly Payroll: $9.56M total monthly compensation

Strategic Recommendations

  1. Immediate: Sales department intervention, new hire retention program
  2. Medium-term: Compensation equity review, training program optimization
  3. Long-term: Succession planning framework, advanced analytics implementation

Monitoring Dashboard Metrics

  • Monthly: Overall attrition rate, department-specific rates, new hire retention
  • Quarterly: Training effectiveness, manager scores, compensation-performance alignment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors