This repository contains data and analysis of 4-year graduation rates at Texas public universities from 2020 to 2022.
The dataset provides insights into graduation rates across public universities in Texas, identifying trends, high-performing institutions, and areas for improvement. The analysis includes data cleaning, visualization, and database integration.
The dataset used in this analysis is located in the project folder and includes:
graduation_rates_at_public_universities_2020-2022.csv: Raw data on 4-year graduation rates at public universities in Texas from 2020 to 2022 located at: https://data.austintexas.gov/api/views/59bi-74ad/rows.csv?accessType=DOWNLOAD
This project:
- Reads the dataset and transfers it to a local phpMyAdmin MySQL database called (bas final).
- Performs data analysis and visualization using Python, Pandas, and Matplotlib.
- Showcase the top university that has the highest graduation rates.
- Identifies the top 3 universities with the highest graduation rates.
- Highlights the bottom 3 universities with the lowest graduation rates.
- Compares all institutions against the average graduation rate.
To use this repository:
Ensure you have Python 3.11 installed:
Download Python
Install the required Python packages:
pip install pandas matplotlib SQLAlchemy pymysqlgit clone https://github.com/tcareer34/Team-BAS.git
Navigate to the cloned directory:
sh
cd Team-BAS
Explore the dataset and analysis in the data and files, respectively.- Set up a phpMyAdmin MySQL database.
- Update the database connection settings in
main.py:
hostname = 'localhost'
uname = 'root'
pwd = ''
dbname = 'bas final'- Ensure that the table name in
main.pymatches your MySQL database:
table_name = 'public_institutions_graduation_rate'Run the script to process the data and generate visualizations:
python main.py


