This exercise is designed to provide hands-on experience in applying Machine Learning and Data Analysis techniques to a real-world dataset. The exercise revolves around analyzing the relationship between student performance (grades) and alcohol consumption. It involves the use of Linear Regression and Ridge Regression models to predict student grades based on the amount of alcohol consumed.
The dataset used in this exercise is the Student Performance Data Set available on Kaggle. The dataset contains information about students' academic performance and various attributes including alcohol consumption, going out habits, study time, etc.
-
Data Loading and Exploration: Load the dataset, explore its structure, and gain an understanding of the variables.
-
Data Preprocessing: Clean the data by handling missing values, converting data types, and selecting relevant features for analysis.
-
Exploratory Data Analysis: Analyze the relationships between different variables, particularly focusing on the correlation between alcohol consumption and grades.
-
Model Building: Implement Linear Regression and Ridge Regression models to predict student grades based on relevant features, with a focus on alcohol consumption.
-
Model Evaluation: Evaluate the models using appropriate evaluation metrics, such as R-squared.
-
Visualization: Visualize the relationships between alcohol consumption, grades, and other relevant variables using scatter plots and other visualization techniques.
-
Interpretation: Interpret the results obtained from the models and draw conclusions about the impact of alcohol consumption on student grades.
-
Clone or download this repository to your local machine.
-
Install the required Python libraries:
- import pandas as pd
- import matplotlib.pyplot as plt
- from sklearn.model_selection import train_test_split
- from sklearn.linear_model import Ridge
- from sklearn.metrics import r2_score
- from sklearn.preprocessing import StandardScaler
-
Download the Student Performance Data Set CSV file and place it in the same directory as the exercise files.
-
Open the Jupyter Notebook or Python script provided in the repository and follow the step-by-step instructions.
The exercise is organized into the following files:
-
student_performance_analysis.ipynborstudent_performance_analysis.py: The main Jupyter Notebook or Python script that guides you through the entire analysis process, from data loading to model evaluation. -
data/student-mat.csv: The dataset file containing student performance data. -
README.md: This README file providing an overview of the exercise and instructions for getting started.
This exercise offers a practical opportunity to apply Linear and Ridge Regression techniques to real data, gaining insights into the correlation between alcohol consumption and student grades. By completing this exercise, participants will gain valuable experience in data preprocessing, model building, evaluation, and interpretation within the context of educational data.
Feel free to modify and expand upon this exercise to further explore different aspects of the dataset or apply more advanced techniques. The overall result was a 77% accuracy, with an R-value of 0.7725960199457684