Skip to content

An exercise for Machine Learning/Data Analysis regarding on the topics of Linear + Ridge Regression in correlation to the grades predicted and alcohol consumption.

License

Notifications You must be signed in to change notification settings

Goqwin/studentperformance

Repository files navigation

Student Performance Data Set Analysis Exercise

Overview

This exercise is designed to provide hands-on experience in applying Machine Learning and Data Analysis techniques to a real-world dataset. The exercise revolves around analyzing the relationship between student performance (grades) and alcohol consumption. It involves the use of Linear Regression and Ridge Regression models to predict student grades based on the amount of alcohol consumed.

The dataset used in this exercise is the Student Performance Data Set available on Kaggle. The dataset contains information about students' academic performance and various attributes including alcohol consumption, going out habits, study time, etc.

Goals

  1. Data Loading and Exploration: Load the dataset, explore its structure, and gain an understanding of the variables.

  2. Data Preprocessing: Clean the data by handling missing values, converting data types, and selecting relevant features for analysis.

  3. Exploratory Data Analysis: Analyze the relationships between different variables, particularly focusing on the correlation between alcohol consumption and grades.

  4. Model Building: Implement Linear Regression and Ridge Regression models to predict student grades based on relevant features, with a focus on alcohol consumption.

  5. Model Evaluation: Evaluate the models using appropriate evaluation metrics, such as R-squared.

  6. Visualization: Visualize the relationships between alcohol consumption, grades, and other relevant variables using scatter plots and other visualization techniques.

  7. Interpretation: Interpret the results obtained from the models and draw conclusions about the impact of alcohol consumption on student grades.

Getting Started

  1. Clone or download this repository to your local machine.

  2. Install the required Python libraries:

    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import Ridge
    from sklearn.metrics import r2_score
    from sklearn.preprocessing import StandardScaler
  1. Download the Student Performance Data Set CSV file and place it in the same directory as the exercise files.

  2. Open the Jupyter Notebook or Python script provided in the repository and follow the step-by-step instructions.

Structure

The exercise is organized into the following files:

  • student_performance_analysis.ipynb or student_performance_analysis.py: The main Jupyter Notebook or Python script that guides you through the entire analysis process, from data loading to model evaluation.

  • data/student-mat.csv: The dataset file containing student performance data.

  • README.md: This README file providing an overview of the exercise and instructions for getting started.

Conclusion

This exercise offers a practical opportunity to apply Linear and Ridge Regression techniques to real data, gaining insights into the correlation between alcohol consumption and student grades. By completing this exercise, participants will gain valuable experience in data preprocessing, model building, evaluation, and interpretation within the context of educational data.

Feel free to modify and expand upon this exercise to further explore different aspects of the dataset or apply more advanced techniques. The overall result was a 77% accuracy, with an R-value of 0.7725960199457684

About

An exercise for Machine Learning/Data Analysis regarding on the topics of Linear + Ridge Regression in correlation to the grades predicted and alcohol consumption.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published