This project showcases end-to-end handling of an airline dataset β from cleaning and normalization to writing analytical SQL queries β using MySQL. It simulates how real-world airline performance data can be modeled and analyzed using relational databases.
The original dataset was cleaned and contained the following key features:
| Column Name | Description |
|---|---|
iata_code |
Unique airline identifier code |
airline_name |
Name of the airline |
region |
Operating region (e.g., Asia, Europe) |
functional_currency |
Currency used in financial reporting |
ebit_usd |
Earnings Before Interest and Tax (in USD) |
load_factor |
Percentage of available seating capacity used |
low_cost_carrier |
Boolean indicating LCC status |
airline_age |
Age of the airline |
num_routes |
Number of routes operated |
passenger_yield |
Revenue earned per passenger |
ask |
Available seat kilometers |
avg_fleet_age |
Average age of aircraft fleet |
fleet_size |
Number of aircraft in operation |
aircraft_utilisation |
Average usage per aircraft |
To improve query efficiency and reduce redundancy, the cleaned dataset was normalized into the following relational tables:
-
Airlines
airline_id(PK)iata_codeairline_nameairline_ageavg_fleet_agefleet_sizelow_cost_carrier(FK fromlow_cost_types)region_id(FK fromregions)currency_id(FK fromcurrencies)
-
Regions
region_id(PK)region_name
-
Currencies
currency_id(PK)functional_currency
-
Low_Cost_Types
low_cost_id(PK)low_cost_label(e.g., Yes/No)
-
Metrics
airline_id(FK)load_factorebit_usdnum_routespassenger_yieldaskaircraft_utilisation
- Top 5 airlines by passenger yield
- Load factor comparison by region
- Fleet utilization trends across low-cost and legacy carriers
- Correlation between airline age and EBIT performance
- Import
cleaned_airline_data.csvinto MySQL - Use the schema: USE sql1;
- Run the CREATE TABLE and INSERT INTO ... SELECT commands from setup.sql (link your file here)
- Explore insights using analytical queries from analysis.sql
π Project Goals -Practice SQL data modeling and relational design -Apply normalization (1NF β 3NF) -Write insightful business intelligence queries -Learn how to work with Git and GitHub using SSH
π Folder Structure pgsql Copy Edit airline-sql-project/ β βββ cleaned_airline_data.csv # Preprocessed dataset βββ setup.sql # All CREATE TABLE + INSERT queries βββ analysis.sql # Business insights using SQL βββ ERD.png # Entity Relationship Diagram βββ README.md # Project documentation π§ Skills Demonstrated SQL (DDL, DML, Joins, Group By, Subqueries)
Data normalization (1NF, 2NF, 3NF)
Database design & ERD
Git & GitHub (SSH workflow)
π Author Punyashree Somashekara Aspiring Data Analyst and AI enthusiast LinkedIn: https://www.linkedin.com/in/punya-shree-s-624a40293/