This project analyzes Sejong City's 2024 public bus usage, exploring ridership patterns by time and region using transit, demographic, weather, and geospatial data to support smarter transit planning.
How can Sejong City improve its public bus services?
This project analyzes 2024 bus usage data to identify patterns by time and region, and examines correlations with various regional indicators β with the goal of providing actionable recommendations for transit planning.
| # | Source | Description |
|---|---|---|
| 1 | ποΈ Sejong City Big Data Platform | 2024 bus card transaction records and Sejong City bus routes |
| 2 | ποΈ Public Data Portal | District (dong)-level information including demographics and household data |
| 3 | π€οΈ Open Meteorological Data Portal | Daily weather and fine dust information |
| 4 | π GitHub | GeoJSON file for spatial mapping |
- Python β core programming language
- Pandas β data loading, cleaning, merging, and aggregation
- NumPy β numerical data type handling
- Matplotlib & Seaborn β static charts and heatmaps
- Plotly Express β interactive bubble map of top bus stops
- Folium β choropleth map for district-level analysis
- WordCloud β visualizing top bus routes by frequency
- Google Colab β development environment with Google Drive integration
Here's what you can do with this project:
- User Type Breakdown: View ridership split between general passengers, elderly/youth, and young users through a pie chart.
- Time-Based Analysis: Explore hourly, daily, and monthly usage trends. Compare weekday vs. weekend ridership patterns across all time zones.
- Top Bus Routes: See the top 10 routes ranked by total ridership, displayed as a bar chart and a word cloud.
- Top Bus Stops: Browse the top 20 stops through a styled table and an interactive bubble map showing boarding and alighting counts.
- District-Level Efficiency: Calculate bus users per station by district and visualize results on a choropleth map.
- Correlation Analysis: Examine how bus usage relates to regional indicators like population, households, and registered vehicles.
- Weather & Air Quality Impact: Discover how temperature, rainfall, and fine dust levels affect daily weekday ridership through scatter plots.
Step 1 β Data Collection Gathered data from four sources: bus card transaction records and route data from the Sejong City Big Data Platform, district-level demographic data from the Public Data Portal, daily weather and fine dust data from the Open Meteorological Data Portal, and a GeoJSON boundary file from GitHub.
Step 2 β Data Cleaning & Preprocessing
Loaded six CSV files using Pandas with careful data type specifications to avoid memory errors. Resolved Korean character encoding issues using the NanumGothic font and euc-kr encoding. Converted date and time columns to proper datetime formats, and added derived columns for month, weekday number, and weekday/weekend classification.
Step 3 β Time-Based Analysis Grouped ridership data by hour, day of week, and month to identify peak usage times. Created line charts for hourly trends and bar/line charts for weekly and monthly patterns, comparing weekday and weekend behavior.
Step 4 β Route & Stop Analysis Merged transaction data with route and station reference files to identify the top 10 bus routes and top 20 bus stops by total ridership. Visualized results using horizontal bar charts, a word cloud, styled tables, and an interactive Plotly bubble map.
Step 5 β District-Level Analysis Filtered data to Sejong City stations only, aggregated ridership by district (dong), and calculated bus users per stop as an efficiency metric. Displayed side-by-side styled tables and rendered a Folium choropleth map using GeoJSON boundary data.
Step 6 β Correlation & Weather Analysis Merged daily ridership data with demographic district data to compute a Pearson correlation matrix, visualized as a heatmap. Also merged with weather and fine dust data to produce scatter plots exploring environmental effects on weekday ridership.
During this project, I picked up important data analysis skills and a better understanding of how to turn raw transit data into meaningful insights, which improved my analytical thinking.
- Data Type Management: Specifying data types during CSV loading taught me how to prevent memory errors and type assignment issues on large datasets.
- Merging & Mapping: I learned how to efficiently join multiple datasets using
merge()andmap(), which was essential for linking transaction records to route and station information.
- Choropleth Mapping: Using Folium with GeoJSON data taught me how to match boundary files with statistical data to produce district-level maps.
- Interactive Bubble Maps: I used Plotly Express to build interactive maps where bubble size and color encode different variables simultaneously.
- Pearson Correlation: I computed and visualized a correlation matrix using Seaborn heatmaps, learning how to interpret relationships between transit usage and demographic indicators.
- Multi-source Merging: Combining weather, fine dust, demographic, and transit data into one analysis taught me how to handle encoding issues (
euc-kr), date formatting, and key mismatches across datasets.
- Font & Encoding: I resolved Korean character display issues in Matplotlib by installing and registering the NanumGothic font, and applied
euc-krencoding when reading Korean CSV files.
Each part of this project helped me understand more about working with real-world messy data. It was more than just analyzing bus usage β it was about solving data problems, learning new tools, and building a replicable framework that could be applied to other cities.
- Open the
.ipynbfile in Google Colab - Mount your Google Drive:
drive.mount('/content/drive/') - Place all CSV and GeoJSON files in
drive/MyDrive/Python/ - Run all cells from top to bottom
- The final output can be exported as an HTML file for sharing
- ποΈ Urban planning β helping city governments optimize bus route coverage and stop placement
- π Transit efficiency β identifying underserved districts that need more bus infrastructure
- π¦οΈ Weather-responsive scheduling β adjusting bus frequency based on weather patterns
- π Academic research β a replicable framework for analyzing public transit in other cities
- π§ Email: veronicak656@gmail.com
- πΌ LinkedIn: www.linkedin.com/in/veronica-kiteve
- π GitHub: github.com/kiteve