(aus-health-sql-portfolio)
This project demonstrates practical SQL skills using Australian Institute of Health and Welfare (AIHW) survey data on hospital usage patterns across Australia's Primary Health Network (PHN) regions.
The goal is not database engineering, but to show clear, well-documented SQL transformations and analytical thinking using real public health data - the kind of work a data analyst does daily.
- Data loading & staging: Loading CSV data into PostgreSQL with deliberate design choices
- Data cleaning & transformation: Handling non-numeric values, type conversions, business rule application
- SQL techniques: CTEs, aggregations, views, JOINs across related tables
- Analytical thinking: Formulating business questions, interpreting results, drawing insights
- Documentation: Clear code comments, reproducible workflow, assumption documentation
- Source: Australian Institute of Health and Welfare (AIHW)
- Datasets:
- Percentage of adults who went to hospital ED in preceding 12 months, by PHN
- Percentage of adults admitted to hospital in preceding 12 months, by PHN
- License: Creative Commons Attribution 4.0 (CC BY 4.0)
- URL: aihw-hpf-52-data-tables
Raw CSV files are loaded into staging tables with minimal constraints.
- Raw CSV files are loaded manually into staging tables using pgAdmin.
- Load scripts intentionally exclude client-specific commands (e.g. \copy) to keep SQL portable.
- All fields are stored as text to avoid load failures due to formatting issues (e.g. “NP” values).
- No transformations are applied at this stage.
- Staging tables act as a reproducible snapshot of the source data.
Cleaned tables are created from the staging layer with explicit transformations:
- Percentage fields are converted to numeric values.
- Rows containing non-published values (NP) are excluded from analysis tables.
- Financial years (e.g. 2013–14) are converted to a single numeric year using the ending financial year as a business rule (e.g. 2014). This layer produces analysis-ready tables with consistent data types and documented assumptions.
Query: vw_admission_by_phn
Finding: Across 30 PHN areas, most (19/30) had admission rates above the national average of 12.8%.
- Highest: North Coast NSW (15.74%), Darling Downs QLD (15.64%)
- Lowest: Western Sydney (10.62%), Central & Eastern Sydney (10.64%)
Business insight: Regional variation in hospital demand - useful for resource allocation and policy targeting.
Query: vw_visits_vs_admissions_gap
Finding: The gap between ED visits and admissions ranged from -0.88 to +6.46 percentage points across PHNs, with most showing 1-3 points difference.
Business insight: ED visits generally exceed admissions, indicating pressure on emergency services varies significantly by region.
Query: vw_avg_admissions_by_year
Finding: Over five years, admission rates remained stable (12.7% to 13.7%) with no clear trend.
Business insight: National admission rates are stable - no major policy or population-driven shifts in this period.
- Staging tables matter: Loading raw data as text first prevents load failures and makes debugging easier
- Document your business rules: Converting "2013–14" to 2014 is a decision that needs to be explicit and traceable
- Views keep analysis clean: Building views on cleaned data rather than raw data makes queries simpler and more reliable
- Install PostgreSQL (tested on version 9.6)
- Create database:
CREATE DATABASE aihw_analysis; - Run scripts in order:
01_raw_load.sql- creates staging tables- Load CSV files via pgAdmin (instructions in script comments)
02_clean_transform.sql- creates cleaned tables03_analysis.sql- creates views and runs analysis queries