The HR Data Console is a comprehensive employee attrition prediction system that leverages machine learning to identify employees at risk of leaving the organization. This system helps HR departments proactively address retention challenges and reduce turnover costs.
- Accuracy: 60.2%
- Precision: 26.7%
- Recall: 85.1%
- F1-Score: 40.6%
- ROC-AUC: 80.1%
- Annual Replacement Cost per Employee: $39,018
- Potential Annual Savings: $515,871
- Net Benefit per True Positive: $27,312
- Raw Data: IBM HR Analytics Dataset with 1,470 employees
- Dataset Source: IBM HR Analytics Attrition Dataset on Kaggle
- Cleaned Data: 2,266 records after preprocessing and SMOTE balancing
- Features: 44 engineered features including demographic, job, and performance metrics
- Algorithm: Random Forest Classifier
- Training Method: SMOTE-balanced dataset
- Optimal Threshold: 0.4200 (maximizes recall for business needs)
- Key Features: OverTime, StockOptions, JobSatisfaction, ManagerRelationship
- Input: Employee data (demographics, job info, performance)
- Output: Attrition risk score (0-1) and high-risk flag
- Processing: Automatic feature alignment and scaling
- OverTime_Yes: Employees working overtime are most likely to leave
- StockOptionLevel: Lower stock options correlate with higher attrition
- JobSatisfaction: Dissatisfied employees are flight risks
- YearsWithCurrManager: Poor manager relationships drive turnover
- EnvironmentSatisfaction: Negative work environment impacts retention
- High-Risk Employees: 28% of workforce identified as potential attrition
- Cost Savings: $515,871 annual savings potential through targeted retention
- ROI: 7:1 return on retention program investments
- Total Employees: 1,470 analyzed
- Overall Attrition Rate: 16.1% (237 employees)
- Department Analysis: Sales has highest attrition (20.63%)
- Job Role Analysis: Sales Representatives have 39.76% turnover
- Experience Patterns: New hires (0-2 years) have 43.9% attrition rate
- Total Annual Replacement Cost: $9.34M
- Department Costs: R&D ($5.01M), Sales ($3.84M), HR ($479K)
- Potential Savings: $3.1M with 10% attrition reduction
- High-Risk Profile: Young (<35), low-income (<$3k/month), low satisfaction
- High-Value at Risk: 6 high-performers with low satisfaction identified
- Gender Pay Gaps: Significant disparities in leadership roles
# Create virtual environment
python -m venv hrvenv
# Activate environment (Windows)
hrvenv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython scripts/create_test_data.pypython scripts/score_new_employee.py --input data/test/sample_employees.csvScored 50 employees
High risk employees (Score >= 0.42): 14
Output saved to: data/test/sample_employees_scored.csv
- Deploy Early Warning Dashboard with 0.4200 threshold
- Targeted Retention Programs for high-risk employees
- Review Compensation based on feature importance
- Integrate predictions into performance management
- Develop Career Path frameworks for stagnant employees
- Manager Training programs to improve relationships
- Attrition Distribution:
reports/attrition_distribution.png- Shows department-wise attrition rates with Sales leading at 20.63% - Correlation Heatmap:
reports/correlation_heatmap.png- Visualizes feature relationships with color-coded correlation coefficients - EDA Summary:
reports/eda_executive_summary.md- Comprehensive exploratory data analysis findings
- Attrition by Department: Sales has highest turnover rate (20.63%) vs R&D (13.84%) and HR (19.05%)
- Job Satisfaction vs Attrition: Strong negative correlation (-0.103) - lower satisfaction = higher attrition
- Overtime Impact: Employees working overtime are 3x more likely to leave (39.76% attrition for overtime workers)
- Experience Patterns: New hires (0-2 years) have 43.9% attrition vs 10.05% for 10+ years experience
- Dataset Overview: 1,470 employees, 16.1% attrition rate
- Data Quality: 1470 missing values identified, 32/33 features complete
- Key Insights: Sales department and entry-level employees most affected
- Visualizations: Department-wise attrition charts, satisfaction distributions
- Feature Engineering: Created promotion_velocity, income_experience_ratio, career_stability
- Risk Scoring: Composite churn risk score with satisfaction, income, and experience factors
- Statistical Analysis: Chi-square tests, correlation analysis, t-tests for significance
- Data Preparation: SMOTE oversampling for class imbalance handling
- Output Files: Enhanced dataset (hr_merged_for_modeling.csv), encoded dataset (hr_encoded_for_ml.csv), summary statistics
- Model Development: Random Forest with SMOTE balancing
- Feature Engineering: 44 features including engineered metrics
- Performance Optimization: Threshold tuning for business needs (0.4200 optimal)
- Business Impact: $515,871 potential annual savings identified
- Model Training: Calibrated Random Forest with 400 estimators, sigmoid calibration
- Evaluation Metrics: ROC-AUC 80.1%, F1-score 40.6%, Recall 85.1% at optimal threshold
- Feature Importance: Top features include OverTime, StockOptionLevel, JobSatisfaction
- Deployment: Model persistence with joblib, scaler artifacts, feature name tracking
- Business Insights: Cost-benefit analysis, high-risk employee identification, actionable recommendations
- Output Files: Trained model (rf_attrition_model_v2.joblib), scaler (attrition_scaler_v2.joblib), feature list (rf_attrition_features_v2.csv)
- Sales: 446 employees, 92 attrition (20.63%), $3.84M replacement cost
- R&D: 961 employees, 133 attrition (13.84%), $5.01M replacement cost
- HR: 63 employees, 12 attrition (19.05%), $479K replacement cost
- Sales Representative: 83 employees, 33 attrition (39.76%) - HIGHEST RISK
- Laboratory Technician: 259 employees, 62 attrition (23.94%)
- Human Resources: 52 employees, 12 attrition (23.08%)
- Average Monthly Income: $6,502.93 overall
- Income Difference: $2,045.65 (stayers earn more than leavers)
- Salary Correlation: -0.160 with attrition (higher pay = lower attrition)
- 0-2 Years: 123 employees, 43.90% attrition - CRITICAL RISK ZONE
- 3-5 Years: 193 employees, 19.17% attrition
- 6-10 Years: 607 employees, 14.99% attrition
- 10+ Years: 547 employees, 10.05% attrition
- Critical Characteristics: Young (<35), low-income (<$3k/month), low satisfaction
- High-Value at Risk: 6 high-performers with low satisfaction identified
- Gender Pay Gaps: Significant disparities in leadership roles (up to 9.99% gaps)
- Total Annual Replacement Cost: $9.34M across all departments
- Potential Savings: $3.1M with 10% attrition reduction
- ROI Potential: 7:1 return on retention program investments
- Monthly Payroll: $9.56M total monthly compensation
- Immediate: Sales department intervention, new hire retention program
- Medium-term: Compensation equity review, training program optimization
- Long-term: Succession planning framework, advanced analytics implementation
- Monthly: Overall attrition rate, department-specific rates, new hire retention
- Quarterly: Training effectiveness, manager scores, compensation-performance alignment