GitHub - n8mauer/LogarithmicAcademicSuccess: The study investigates whether an undergraduate student’s academic success can be predicted using log-log or power-log relationships. Using a dataset on higher education predictors, I preprocess data with logarithmic transformations and apply linear regression. Findings could show elasticities and interaction effects.

Predicting Undergraduate Academic Success

Executive Summary

This project aims to predict undergraduate student academic success based on several factors using log-log and power-log relationships. By utilizing the “Predict students' dropout and academic success” dataset from Kaggle, the analysis focuses on preprocessing data with logarithmic transformations to capture non-linear and multiplicative effects. The power-log relationship $\ y = ax^b$ transforms to $\log(y) = \log(a) + b\log(x)$, enabling the application of linear regression on log-transformed variables.

Rationale

This project leverages machine learning to provide a nuanced understanding of the factors affecting academic success, enabling the development of effective and intentional educational strategies to populations that could be in need of additional resources.

Over the past 20 years, the undergraduate dropout rate in the United States has shown notable trends and variations. On average, about 40% of college students do not complete their degree programs. The dropout rate tends to be highest in the first year, with approximately 20-30% of freshmen not returning for their sophomore year.

Economic factors play a significant role in dropout rates. Financial instability is a primary reason for 38% of students leaving college. Additionally, demographic differences are evident, with higher dropout rates among certain racial and ethnic groups. For example, Black and Native American students have higher dropout rates compared to Asian students, who tend to have the lowest dropout rates among all racial groups.

These trends highlight the complexity of the dropout issue, influenced by economic, institutional, and demographic factors. This data underscores the need for targeted interventions to support at-risk student populations and improve overall graduation rates.

Research Question

Can an undergraduate student’s academic success be predicted based on several factors using log-log or power-log relationships?

Data Source

The data can be found here: Kaggle Dataset.

The dataset offers an overview of students enrolled in various undergraduate programs at a higher education institution. It encompasses demographic data, socioeconomic factors, and academic performance details, facilitating the analysis of potential predictors of student dropout and academic success. The dataset includes multiple separate databases with pertinent information available at enrollment, such as application mode, marital status, and chosen course. Moreover, it allows for the estimation of overall student performance at the end of each semester by evaluating credited, enrolled, assessed, and approved curricular units along with their respective grades. Additionally, regional economic indicators like unemployment rate, inflation rate, and GDP are included to explore how economic factors influence student dropout rates and academic success. This comprehensive analysis tool provides valuable insights into the factors that motivate students to either persist in their studies or withdraw, across a diverse array of disciplines including agronomy, design, education, nursing, journalism, management, social services, and technologies.

Methodology

Framework: For this project, I leveraged CRISP-DM (Cross-Industry Standard Process for Data Mining). The process provided a structured approach for this data mining project to systematically address the research question, ensuring a thorough analysis that leads to meaningful and actionable insights.

Data Preprocessing: Applying logarithmic transformations to both dependent and independent variables helps linearizing multiplicative relationships. For example, doubling study hours may not linearly double academic performance due to diminishing returns.

Feature Identification and Preparation: Key factors influencing academic success, such as study hours and attendance, are identified and prepared. Interaction terms are created to capture the combined effects of multiple factors.

Model Training: Linear regression models are trained on log-transformed data to understand the elasticities and interaction effects among variables.

Results

Upon analysis of the coefficients to understand the relationships, I found that coefficients represent elasticities; and, the intercept represents the expected log of academic success when all log-transformed factors are zero (which corresponds to the multiplicative constant in the original non-logarithmic scale). The insights gained from analysis can help in understanding how various factors combine and contribute to academic success in a non-linear, multiplicative manner, providing a more nuanced understanding that can inform targeted interventions and support strategies.

Outline of Project

Next Steps

Model Validation: Validate the model using cross-validation techniques to ensure robustness. Compare the performance of the log-log model with other models, such as polynomial regression or non-linear models.

Reporting and Visualization: Create more visualizations to illustrate the relationships between additional variables and academic success.

Implementation and Further Research: Prepare a comprehensive report summarizing the findings, including key insights and potential recommendations for interventions. Conduct further research to explore additional factors or to validate findings in different educational contexts or datasets.

Contact and Further Information

Name: Nate Mauer

Email: n8mauer@gmail.com

LinkedIn: https://www.linkedin.com/in/natemauer/

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Capstone File.ipynb		Capstone File.ipynb
Exploratory Data Analysis.ipynb		Exploratory Data Analysis.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Undergraduate Academic Success

Executive Summary

Rationale

Research Question

Data Source

Methodology

Results

Outline of Project

Next Steps

Contact and Further Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicting Undergraduate Academic Success

Executive Summary

Rationale

Research Question

Data Source

Methodology

Results

Outline of Project

Next Steps

Contact and Further Information

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages