JEET is a Machine Learning project designed to predict the likelihood of students dropping out after Class 12, focusing on those preparing for or appearing in the Join Entrance Examination (JEE) and/or National Entrance cum Eligibility Test (NEET) in India. BY analysing academic and socio-economic data, JEET helps educators, counselors, and students to identify at-risk indivuiduals for timely intervention.
JEE_Dropout_ML_Project/
βββ .gitignore
βββ LICENSE
βββ Plan.md
βββ README.md
βββ 01_Data/
β βββ 01_raw/
β β βββ JEE_Dropout_After_Class_12.csv
β βββ 02_Cleaned and Engineered/
β β βββ Feature_Engineered.csv
β β βββ JEE_Dropout_Cleaned.csv
β βββ 03_final/
β βββ JEE_Dropout_Final.csv
βββ 02_Data Analysis/
β βββ 01_Exploration.ipynb
β βββ 02_Cleaning.ipynb
β βββ 03_Feature_Engineering.ipynb
β βββ 04_Merging_Final.ipynb
βββ 03_notebooks/
β βββ Prototypes_models.ipynb
βββ 04_src/
β βββ evaluation.py
β βββ jee_dropout_model.py
βββ 05_models/
β βββ Model_JEET.pkl
βββ 06_deployment/
β βββ Home.py
β βββ requirements.txt
β βββ model/
β β βββ JEET.pkl
β βββ pages/
β βββ Data.py
β βββ JEET.py
βββ gitignore
βββ LICENSE
βββ README.md
- Predicts dropout risk using student academic and socio-economic data
- Handles imbalanced data with SMOTE for better accuracy
- User-friendly web app built with Streamlit
- Enables educators and students to take data-driiven actions
includes features such as
- Academic scores (Class 10, Class 12, JEE/NEET scores)
- Attendance records
- Socio-economic factors (family income, parent's education)
- Target Labele: dropout status (Yes/No)
git clone https://github.com//Chracker24/JEE_Dropout_ML_Project.git
cd JEE_Dropout_ML_Projectpython -m venv venv
source venv/Scripts/activate #MacOS: venv/bin/activatepip install -r requirements.txt- Navigate to Home.py in 06_deployment
- Run the Streamlit application using Bash
streamlit run Home.py- Input stuudent details via the web interface to get dropout risk predictions
- Talk with Google Gemini powered JEET chatbot about your shortcomings and help
- Data Cleaning, encoding cateogorical variables, and feature scaling and engineering
- Addressed class imbalance using SMOTE
- Trained Random Forest Classifier as the main model
- Evaluated with accuracy, precision, recall, F1-Score and ROC-AUC
- used SHAP for model interpretability
- Achieved approximately 73% accuracy
- This is due to presence of leaky data that got removed and will be fixed in later iterations with better data
- Nonetheless, Improved recall with data balancing
- Important predictors : socio-economic status, peer pressure and mental health
- Incorporate psychological and motivational data
- Explore advanced models (XGBoost, deep learning)
- Add personalized intervention suggestions
- Expand dataset for broader applicability and fix leaky data situation
- Python, scikit-learn
- imbalanced-learn (SMOTE, Pipeline)
- pandas, numpy, seaborn, matplotlib
- SHAP (Interpretabiility)
- Streamlit (deployment)
Christy Chovalloor - Software Engineering Student, Queen's University Belfast Linkedin Github
Documentation will be up soon and I will surely notify through my linkedin. Stay Tuned!
MIT License
For questions or collaboration, contact through email : Chr2412@hotmail.com Email me