Student Performance Predictor

Overview

A comprehensive machine learning project that predicts student math scores using demographic and academic features. The system implements a complete MLOps pipeline from data ingestion to model deployment with a user-friendly web interface.

Features

End-to-End ML Pipeline: Complete workflow from data ingestion to model deployment
Multiple Algorithm Comparison: Tests 7 different regression algorithms with hyperparameter tuning
Real-time Predictions: Flask web application for instant score predictions
Automated Model Selection: Automatically selects the best performing model based on R² score
Data Preprocessing: Handles categorical encoding and feature scaling
Modular Architecture: Well-structured codebase with separate components for each pipeline stage

Project Structure

student_performance/
├── artifacts/                    # Stored models and preprocessors
│   ├── model.pkl                # Trained ML model
│   ├── preprocessor.pkl         # Data preprocessing pipeline
│   ├── train.csv               # Training dataset
│   ├── test.csv                # Testing dataset
│   └── data.csv                # Raw dataset
├── notebook/
│   └── data/
│       └── stud.csv            # Original dataset
├── src/
│   ├── components/             # Core ML pipeline components
│   │   ├── __init__.py
│   │   ├── data_ingestion.py   # Data loading and splitting
│   │   ├── data_transformation.py  # Data preprocessing
│   │   └── model_trainer.py    # Model training and selection
│   ├── pipeline/               # Prediction pipeline
│   │   ├── __init__.py
│   │   └── predict_pipeline.py # Inference pipeline
│   ├── __init__.py
│   ├── exception.py            # Custom exception handling
│   ├── logger.py              # Logging configuration
│   └── utils.py               # Utility functions
├── templates/                  # HTML templates for web app
│   ├── index.html             # Homepage template
│   └── home.html              # Prediction form template
├── app.py                     # Flask web application
├── requirements.txt           # Project dependencies
└── README.md                  # Project documentation

Machine Learning Pipeline

Data Ingestion

Loads student performance dataset
Splits data into training (80%) and testing (20%) sets
Saves processed datasets to artifacts folder

Data Transformation

Handles categorical variables (gender, ethnicity, education level, etc.)
Applies feature scaling using StandardScaler
Creates preprocessing pipeline for consistent data transformation

Model Training

The system evaluates multiple regression algorithms:

Random Forest Regressor
Decision Tree Regressor
Gradient Boosting Regressor
Linear Regression
XGBoost Regressor
CatBoost Regressor
AdaBoost Regressor

Each model undergoes hyperparameter tuning using GridSearchCV to find optimal parameters.

Model Selection

Automatically selects the best performing model based on R² score
Requires minimum R² score of 0.6 for model acceptance
Saves the best model for production use

Input Features

Gender: Male/Female
Race/Ethnicity: Student's ethnic background
Parental Level of Education: Education level of parents
Lunch: Standard or free/reduced lunch
Test Preparation Course: Completed or not completed
Reading Score: Student's reading test score
Writing Score: Student's writing test score

Output

Math Score Prediction: Predicted math test score (0-100)

Technologies Used

Python 3.8
Scikit-learn: Machine learning algorithms and preprocessing
XGBoost & CatBoost: Advanced boosting algorithms
Flask: Web application framework
Pandas & NumPy: Data manipulation and analysis
HTML/CSS: Frontend interface

Model Performance

The system automatically selects the best performing model based on R² score evaluation on test data, ensuring reliable predictions for student math performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Student Performance Predictor

Overview

Features

Project Structure

Machine Learning Pipeline

Data Ingestion

Data Transformation

Model Training

Model Selection

Input Features

Output

Technologies Used

Model Performance

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
artifacts		artifacts
catboost_info		catboost_info
notebook		notebook
src		src
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

Uh oh!

Uh oh!

preetham-11/student_performance_predictor

Folders and files

Latest commit

History

Repository files navigation

Student Performance Predictor

Overview

Features

Project Structure

Machine Learning Pipeline

Data Ingestion

Data Transformation

Model Training

Model Selection

Input Features

Output

Technologies Used

Model Performance

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages