An AI-powered career recommendation system that analyzes academic performance and personal attributes to predict suitable career paths for students.
- Overview
- Features
- Tech Stack
- Model Architecture
- Quick Start
- Installation
- Model Training
- Usage
- API Reference
- Deployment
- Project Structure
- Performance
- Contributing
- License
Live application: catxforest.onrender.com
CatXForest is a machine learning-powered web application that provides personalized career recommendations for students. The system analyzes academic scores across seven subjects, extracurricular activities, and study habits to predict the top 5 most suitable career paths with confidence probabilities.
The hybrid ensemble model (Stacking Classifier) combines Random Forest, XGBoost, and CatBoost algorithms to achieve 84.97% accuracy in predicting suitable careers from 17 different career categories.
AI-Powered Predictions: Hybrid ensemble model with 84.97% accuracy
Top 5 Recommendations: Ranked career suggestions with probability scores
Interactive Web Interface: User-friendly Flask web application
Visual Analytics: Pie chart visualization of career probabilities
Production Ready: Deployed and scalable architecture
Data Privacy: No data storage; predictions happen in real-time
Responsive Design: Works seamlessly across devices
The system can recommend from 17 diverse career paths:
Lawyer, Doctor, Government Officer, Artist, Software Engineer, Teacher, Business Owner, Scientist, Banker, Writer, Accountant, Designer, Construction Engineer, Game Developer, Stock Investor, Real Estate Developer
Students provide the following information:
- Personal details (gender)
- Academic scores for 7 subjects: Math, History, Physics, Chemistry, Biology, English, Geography
- Study habits: weekly self-study hours
- Activities: part-time job status, extracurricular participation, absence days
- Top 5 career recommendations with confidence percentages
- Visual pie chart representation
- Detailed probability breakdown
Framework: Flask 3.0.0
ML Libraries: scikit-learn 1.5.2, XGBoost, CatBoost, NumPy 2.0.0+
Model Type: Stacking Classifier (Hybrid Ensemble)
HTML5, CSS3, JavaScript (Chart.js for visualizations)
Platform: Render
Server: Gunicorn 21.2.0
CI/CD: Automated via Render YAML configuration
The system uses a Stacking Classifier that combines:
Base Models:
- Random Forest Classifier
- XGBoost Classifier
- CatBoost Classifier
Meta-Model: Logistic Regression (final estimator)
Cross-Validation: 5-fold CV for robust performance
Input Features (14 total):
- Gender (encoded)
- Part-time job status (binary)
- Absence days (numeric)
- Extracurricular activities (binary)
- Weekly self-study hours (numeric)
- Subject scores (7 subjects)
- Total score (engineered)
- Average score (engineered)
Preprocessing:
- Label encoding for categorical variables
- Standard scaling for numerical features
- SMOTE for class imbalance handling
- Python 3.9 or higher
- pip package manager
- Git
git clone https://github.com/shreyasmene06/CatXForest.git
cd CatXForest
pip install -r requirements.txt
python app.pyVisit http://localhost:5000 in your browser.
git clone https://github.com/shreyasmene06/CatXForest.git
cd CatXForestWindows:
python -m venv venv
venv\Scripts\activatemacOS/Linux:
python3 -m venv venv
source venv/bin/activatepip install --upgrade pip
pip install -r requirements.txtpython -c "import flask, sklearn, numpy; print('All dependencies installed successfully!')"The pre-trained models are included in the Models/ directory. To retrain the model from scratch:
Download the student dataset from Kaggle and place student-scores.csv in the project root.
Open and run Studies Recommendations.ipynb in Jupyter Notebook:
jupyter notebook "Studies Recommendations.ipynb"Or use JupyterLab:
jupyter lab "Studies Recommendations.ipynb"The notebook will:
- Load and preprocess the dataset
- Create engineered features (total_score, average_score)
- Encode categorical variables
- Balance the dataset using SMOTE
- Train multiple models and compare performance
- Create a hybrid ensemble (Stacking Classifier)
- Save the best model to
Models/directory
After training, the following files are created:
Models/model.pkl: Random Forest model (83% accuracy)Models/hybrid_model.pkl: Stacking ensemble (84.97% accuracy)Models/scaler.pkl: StandardScaler for feature normalizationModels/model_info.pkl: Model metadata and performance metrics
Note: The application uses model.pkl by default. To use the hybrid model, update line 14 in app.py to load hybrid_model.pkl.
- Install Dependencies
pip install -r requirements.txt
Start the application:
python app.pyAccess the web interface:
- Open browser and navigate to
http://localhost:5000 - Or
http://0.0.0.0:5000for network access
Make predictions:
- Fill in the student information form
- Click "Get Recommendations"
- View top 5 career suggestions with probabilities
For production environments:
gunicorn app:app --bind 0.0.0.0:5000 --workers 4Home Page
Returns: HTML landing page
Recommendation Form
Returns: HTML form for student data input
Get Career Predictions
Request Body (Form Data):
{
"gender": "female",
"part_time_job": "true",
"absence_days": "2",
"extracurricular_activities": "true",
"weekly_self_study_hours": "7",
"math_score": "85",
"history_score": "72",
"physics_score": "88",
"chemistry_score": "90",
"biology_score": "76",
"english_score": "80",
"geography_score": "75",
"total_score": "566",
"average_score": "80.857"
}Response:
- HTML page with top 5 career recommendations, probability percentages, and interactive pie chart visualization
| Parameter | Type | Range | Required |
|---|---|---|---|
| gender | string | "male" or "female" | Yes |
| part_time_job | string | "true" or "false" | Yes |
| absence_days | integer | 0-365 | Yes |
| extracurricular_activities | string | "true" or "false" | Yes |
| weekly_self_study_hours | integer | 0-168 | Yes |
| math_score | integer | 0-100 | Yes |
| history_score | integer | 0-100 | Yes |
| physics_score | integer | 0-100 | Yes |
| chemistry_score | integer | 0-100 | Yes |
| biology_score | integer | 0-100 | Yes |
| english_score | integer | 0-100 | Yes |
| geography_score | integer | 0-100 | Yes |
| total_score | float | Sum of all scores | Yes |
| average_score | float | Average of all scores | Yes |
The project includes a render.yaml configuration for automated deployment.
- Fork this repository
- Sign up/login to Render
- Click "New" → "Blueprint"
- Connect your GitHub repository
- Render will automatically detect
render.yamland deploy
- Create a new Web Service on Render
- Connect your GitHub repository
- Configure:
- Name:
catxforest(or your choice) - Runtime: Python 3
- Build Command:
pip install --upgrade pip && pip install -r requirements.txt - Start Command:
gunicorn app:app --bind 0.0.0.0:$PORT
- Name:
- Click "Create Web Service"
Heroku:
heroku login
heroku create your-app-name
git push heroku main
heroku openThe Procfile is already configured.
Railway:
npm i -g @railway/cli
railway login
railway init
railway upGoogle Cloud Run:
gcloud builds submit --tag gcr.io/PROJECT_ID/catxforest
gcloud run deploy --image gcr.io/PROJECT_ID/catxforest --platform managedNo environment variables are required for basic deployment. For advanced configuration:
| Variable | Description | Default |
|---|---|---|
| PORT | Server port | 5000 |
| FLASK_ENV | Environment | production |
| DEBUG | Debug mode | False |
CatXForest/
│
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── runtime.txt # Python version specification
├── Procfile # Heroku deployment config
├── render.yaml # Render deployment config
├── LICENSE # MIT License
├── README.md # Documentation
│
├── Models/ # Trained ML models
│ ├── model.pkl # Random Forest model
│ ├── hybrid_model.pkl # Stacking ensemble model
│ ├── scaler.pkl # Feature scaler
│ └── model_info.pkl # Model metadata
│
├── templates/ # HTML templates
│ ├── home.html # Landing page
│ ├── recommend.html # Input form page
│ └── results.html # Results/predictions page
│
├── static/ # Static assets
│ └── *.png # Images
│
├── Studies Recommendations.ipynb # Model training notebook
└── student-scores.csv # Dataset (download separately)
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Random Forest | 83.00% | 0.82 | 0.83 | 0.82 |
| XGBoost | 82.73% | 0.82 | 0.83 | 0.82 |
| CatBoost | 83.38% | 0.83 | 0.83 | 0.83 |
| Stacking Ensemble | 84.97% | 0.85 | 0.85 | 0.85 |
- High accuracy across all 17 career categories
- Balanced precision and recall
- Robust performance on imbalanced data (SMOTE applied)
- Fast inference time (~50ms per prediction)
The model performs exceptionally well on:
- Teacher (97% recall)
- Writer (98% recall)
- Game Developer (98% recall)
- Designer (94% recall)
- Business Owner (93% recall)
Contributions are welcome. Here's how you can help:
- Check existing Issues
- Create a new issue with:
- Clear title and description
- Steps to reproduce
- Expected vs actual behavior
- Screenshots (if applicable)
- Open an issue with the
enhancementlabel - Describe the feature and its benefits
- Provide examples or mockups if possible
- Fork the repository
- Create a feature branch:
git checkout -b feature/AmazingFeature - Commit changes:
git commit -m 'Add AmazingFeature' - Push to branch:
git push origin feature/AmazingFeature - Open a Pull Request
git clone https://github.com/shreyasmene06/CatXForest.git
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
python app.pyStudent Studies Recommendation Dataset on Kaggle
The dataset contains student information including:
- Personal details: name, email, gender
- Academic scores: 7 subjects (Math, History, Physics, Chemistry, Biology, English, Geography)
- Activities: part-time job status, extracurricular activities
- Study habits: weekly self-study hours, absence days
- Career aspiration: target variable (17 categories)
- Dropped irrelevant columns (ID, name, email)
- Created engineered features (total_score, average_score)
- Encoded categorical variables (gender, boolean fields)
- Balanced dataset using SMOTE oversampling
- Applied Standard Scaling to features
- No personal identifiable information (PII) is stored
- Email addresses and names are discarded during preprocessing
- Model predictions are for educational guidance only
- Should be used in conjunction with professional career counseling
Input Validation: All user inputs are validated before processing
No Data Storage: Predictions are made in real-time without storing user data
HTTPS Ready: Configure SSL/TLS in production environments
Dependency Updates: Regular security updates
This project is licensed under the MIT License. See the LICENSE file for details.
MIT License
Copyright (c) 2025 CatXForest Team
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
Author: Shreyas Mene
Email: shreyasmene6@gmail.com
GitHub: @shreyasmene06
Repository: github.com/shreyasmene06/CatXForest