Perth sold-property analytics — a small Python data pipeline plus a static, interactive dashboard (no framework build step).
This repository turns a CSV of sold listings into JSON consumed by a single-page dashboard. You can explore prices over time, compare suburbs, inspect individual properties, and view listings on a map — including optional layers for schools and public transport when the supporting data is present.
| Area | What you get |
|---|---|
| Header | Branded banner, logo, and title; Montserrat for the main heading. |
| KPI strip | High-level stats driven by the current filter set (counts, median / mean / spread where applicable). |
| Filters | Suburb, bedrooms, bathrooms, year, and a dual-handle price range. Clear filters gains a soft highlight whenever any filter is active. |
| Yearly median price | Line or bar chart (toggle). Interactions can focus the chart on a suburb or a specific property’s sale history when the data allows. |
| Map | Leaflet map with suburb heat polygons, individual properties, optional parks, estimated school points, and (if generated) simplified PTA public transport overlays. |
| Suburbs table | Sortable Suburbs by sales volume (and related price / variation columns). |
| Properties table | Parallel view: Properties by sales volume at address level, same style of metrics, with pagination (20 rows per page) and its own sort state. |
| Distribution | Suburb-level price distribution visualisation (Chart.js), aligned with the rest of the filtering model. |
The UI is plain HTML, CSS, and vanilla JavaScript (dashboard/app.js), with Chart.js, Plotly, and Leaflet loaded from CDNs.
| Path | Role |
|---|---|
perth_property_data.csv |
Input sold-property dataset (place at repo root before running the pipeline; it may be gitignored or absent in clones). |
scripts/build_dashboard_data.py |
Reads the CSV with pandas, aggregates, and writes JSON under dashboard/data/. |
scripts/build_public_transport_data.py |
Optional: builds trimmed GeoJSON for the dashboard when PTA source files exist. |
scripts/fetch_parks_osm.py |
Optional helper: downloads metro-wide parks/green areas from OpenStreetMap (Overpass API) to refresh parks layer inputs. |
scripts/property_annual_returns.py |
Builds property_annual_return_*.json (sale-to-sale CAGR and per-property summaries) from the CSV. |
scripts/run_update.py |
Entry point: runs the dashboard build; if PTA stop/route GeoJSON folders are present, runs the public-transport build as well. |
dashboard/ |
Static front end: index.html, styles.css, app.js, assets/ (logo, banner, favicon). |
dashboard/data/ |
Generated JSON (and GeoJSON) consumed by the app — refresh via the scripts above. |
data_schema.md |
Notes on columns, types, and data quality where documented. |
| Location | Contents |
|---|---|
README.md (this file) |
Project purpose, dashboard features, data inventory, and how to run / refresh data. |
data_schema.md |
Schema notes for the listing dataset. May reference screenshots or diagrams under image/data_schema/. |
scripts/build_dashboard_data.py |
Inline comments; defines which CSV columns are exported into listings_core / listings_sample and how aggregates are built. |
scripts/property_annual_returns.py |
Docstring: sale dedupe rules, CAGR formula, and projection helpers. |
scripts/build_public_transport_data.py |
Module docstring: PTA source paths, Perth metro bounding box, and output file names. |
scripts/run_update.py |
Short entrypoint: order of builds and when public-transport generation runs. |
| PTA GeoJSON folders (if present) | Some downloads include metadata.txt beside the .geojson (supplier metadata). |
| Path | Role |
|---|---|
perth_property_data.csv |
Primary input. Sold listings (one row per sale / listing as defined in your extract). Required at the repository root. The pipeline drops rows with missing, non-numeric, or < AUD 100,000 Price (MIN_PRICE_AUD in scripts/build_dashboard_data.py). Regenerate all dashboard/data/*.json after changing the CSV or this rule (python scripts/run_update.py). |
Stops_PTA_001_WA_GDA2020_Public_GeoJSON/Stops_PTA_001_WA_GDA2020_Public.geojson |
Optional. WA public transport stops (GDA2020). Used only if this folder and file exist when you run run_update.py. |
Service_Routes_PTA_002_WA_GDA2020_Public_GeoJSON/Service_Routes_PTA_002_WA_GDA2020_Public.geojson |
Optional. WA service route geometry. Same condition as stops. |
dashboard/data/suburb_boundaries.geojson |
Required for suburb polygon rendering. WA locality boundaries filtered to dashboard suburbs (generated from Geoscape WA Localities). |
dashboard/data/parks.geojson |
Optional. Metro-wide parks/green-area polygons for map overlay. Current project build uses OpenStreetMap extraction via Overpass. |
School locations on the map are not a separate download: school_points_estimated.json is derived in the pipeline from listing fields (e.g. primary school name and coordinates aggregated from sales rows).
Other archives or layers you may keep in the repo (for example regional parks ZIPs or rail-corridor GeoJSON) are not read by the current dashboard scripts unless you extend the pipeline.
- WA Suburb/Locality Boundaries (Geoscape, CC BY 4.0): data.gov.au dataset page
- GeoJSON/WFS endpoint listed by data.gov.au: resource page
- OpenStreetMap (parks / green areas) via Overpass API: Overpass API, OSM Overpass docs
Created by scripts/build_dashboard_data.py unless noted otherwise. Every file below reflects the price-filtered dataset (numeric Price ≥ AUD 100,000 only).
| File | Description |
|---|---|
summary.json |
Dataset-wide rollups: row count, column count, sale date range, price median / mean / P75 / P95. |
listings_core.json |
Per-row listing attributes (geocoded rows only), including price, dates, suburb, address, beds/baths, land size, distances, school name fields — used as the main fact table in the browser. |
listings_sample.json |
Random subset of listings (up to 8,000 rows) with the same column slice; useful for lighter experiments (not loaded by the current app.js). |
yearly.json |
Median price and sale counts by calendar year (aggregate). |
yearly_by_suburb.json |
Median price and counts by suburb × year. |
property_type_stats.json |
Counts and median price by property type. |
suburb_stats.json |
Suburb-level counts, median / average price, average distance to CBD. |
suburb_map_stats.json |
Suburb centroids (mean lat/lon), counts, and average / median price for map circles. |
school_points_estimated.json |
One point per primary school name with estimated coordinates (mean of listing locations) and counts — used by the dashboard map layer. |
suburb_boundaries.geojson |
Suburb/locality polygons used by the map choropleth/heat rendering for suburb-based metrics. |
parks.geojson |
Parks/green-area polygons shown in the optional Parks map layer. |
Created by scripts/property_annual_returns.py (invoked from build_dashboard_data.py):
| File | Description |
|---|---|
property_annual_return_intervals.json |
One row per consecutive sale pair on the same house_key (aligned with dashboard houseKey logic). Fields: prev_date_sold, date_sold, prev_price, price, years (elapsed time in fractional years, days ÷ 365.25), annual_return. Pairs with years < 1 are omitted (unstable CAGR on short holds). Intervals with CAGR > 100%/yr are omitted. |
property_annual_return_summary.json |
One row per property with latest sale price/date, interval_count, and avg_annual_return (mean of interval CAGR values). Includes future_price_1y = current_price * (1 + avg_annual_return) when the average is defined. |
Deduplication before intervals: same calendar day + same price → one row; same day + different prices → keep max price only. Annualized return between sales: (price / prev_price) ** (1 / years) - 1, only when elapsed time is at least one year. For horizons of n years: future_price = current_price * (1 + avg_annual_return) ** n (not stored for every n in JSON; compute in the UI or extend the pipeline).
Created by scripts/build_public_transport_data.py when PTA inputs exist:
| File | Description |
|---|---|
public_transport_stops.geojson |
Stops clipped / simplified for Greater Perth for browser use. |
public_transport_routes.geojson |
Route linework clipped / simplified for the same area. |
parks_osm_raw.json |
Raw Overpass response used to produce parks.geojson (if you choose to refresh parks data via script). |
At startup, dashboard/app.js fetches:
| File | Purpose |
|---|---|
summary.json |
Baseline KPI metadata combined with live-filtered rows. |
listings_core.json |
All charts, tables, map markers, and filters. |
property_annual_return_intervals.json |
Optional precomputed resale intervals; if present and non-empty, the dashboard uses the same suburb metric as in-browser logic. Suburb “Avg resale growth (%)” = mean across calendar years of the per-year mean CAGR (each interval: later sale year, same house_key, hold ≥ 1 year, CAGR ≤ 100%/yr, same-day dedupe; not a sum of simple price deltas). |
school_points_estimated.json |
School overlay on the map. |
public_transport_stops.geojson |
Optional; if missing or invalid, transport layers are skipped gracefully. |
public_transport_routes.geojson |
Optional; same as above. |
suburb_boundaries.geojson |
Suburb polygon boundaries for heat/choropleth map rendering. |
parks.geojson |
Optional parks/green-area overlay layer. |
The other JSON files in dashboard/data/ are still produced by the pipeline for analysis, exports, or future UI — they are not requested by the current static page.
- Python 3.10+ (or compatible) with pandas installed (
pip install pandasif needed). - A modern browser for the dashboard.
From the repository root:
python scripts/run_update.pyThis expects perth_property_data.csv at the project root. Rows with missing Price, price < AUD 100,000, or non‑numeric price are dropped before any aggregates are built. Re-run this command after changing the CSV or the price rule so every JSON under dashboard/data/ stays in sync (KPIs, map, tables, CAGR). It refreshes summary.json, listings_core.json, yearly.json, property_annual_return_intervals.json, property_annual_return_summary.json, and related files. If the script exits with an error, no row satisfied the price filter — check the data or MIN_PRICE_AUD in scripts/build_dashboard_data.py.
Without the CSV, you can still drop Price < 100k from an existing dashboard/data/listings_core.json and refresh summary.json counts with:
node scripts/sanitize_dashboard_data_min_price.jsIf you have the PTA GeoJSON datasets in the expected folders (Stops_PTA_001_…, Service_Routes_PTA_002_…), the same command also rebuilds the simplified transport layers used by the map.
To refresh parks data from OpenStreetMap (metro Perth extract), run:
python scripts/fetch_parks_osm.pyThis creates/updates dashboard/data/parks_osm_raw.json and is used to regenerate dashboard/data/parks.geojson in this project workflow.
Bundled under dashboard/assets/:
perth-property-logo.png— header wordmark.header-banner.png— full-width header background.favicon.png— browser tab / bookmark icon (rel="icon"andapple-touch-iconinindex.html).
Listing data and third-party spatial extracts (PTA, parks, etc.) may be subject to their own terms. Use this project in line with those sources and applicable law.
Built for clear, filter-driven exploration of Perth property sales.