GitHub - Jfilhorv/perth_property_data: perth property data

Perth Property Data

Perth sold-property analytics — a small Python data pipeline plus a static, interactive dashboard (no framework build step).

Overview

This repository turns a CSV of sold listings into JSON consumed by a single-page dashboard. You can explore prices over time, compare suburbs, inspect individual properties, and view listings on a map — including optional layers for schools and public transport when the supporting data is present.

What is in the dashboard?

Area	What you get
Header	Branded banner, logo, and title; Montserrat for the main heading.
KPI strip	High-level stats driven by the current filter set (counts, median / mean / spread where applicable).
Filters	Suburb, bedrooms, bathrooms, year, and a dual-handle price range. Clear filters gains a soft highlight whenever any filter is active.
Yearly median price	Line or bar chart (toggle). Interactions can focus the chart on a suburb or a specific property’s sale history when the data allows.
Map	Leaflet map with suburb heat polygons, individual properties, optional parks, estimated school points, and (if generated) simplified PTA public transport overlays.
Suburbs table	Sortable Suburbs by sales volume (and related price / variation columns).
Properties table	Parallel view: Properties by sales volume at address level, same style of metrics, with pagination (20 rows per page) and its own sort state.
Distribution	Suburb-level price distribution visualisation (Chart.js), aligned with the rest of the filtering model.

The UI is plain HTML, CSS, and vanilla JavaScript (dashboard/app.js), with Chart.js, Plotly, and Leaflet loaded from CDNs.

Repository layout

Path	Role
`perth_property_data.csv`	Input sold-property dataset (place at repo root before running the pipeline; it may be gitignored or absent in clones).
`scripts/build_dashboard_data.py`	Reads the CSV with pandas, aggregates, and writes JSON under `dashboard/data/`.
`scripts/build_public_transport_data.py`	Optional: builds trimmed GeoJSON for the dashboard when PTA source files exist.
`scripts/fetch_parks_osm.py`	Optional helper: downloads metro-wide parks/green areas from OpenStreetMap (Overpass API) to refresh parks layer inputs.
`scripts/property_annual_returns.py`	Builds `property_annual_return_*.json` (sale-to-sale CAGR and per-property summaries) from the CSV.
`scripts/run_update.py`	Entry point: runs the dashboard build; if PTA stop/route GeoJSON folders are present, runs the public-transport build as well.
`dashboard/`	Static front end: `index.html`, `styles.css`, `app.js`, `assets/` (logo, banner, favicon).
`dashboard/data/`	Generated JSON (and GeoJSON) consumed by the app — refresh via the scripts above.
`data_schema.md`	Notes on columns, types, and data quality where documented.

Documentation

Location	Contents
`README.md` (this file)	Project purpose, dashboard features, data inventory, and how to run / refresh data.
`data_schema.md`	Schema notes for the listing dataset. May reference screenshots or diagrams under `image/data_schema/`.
`scripts/build_dashboard_data.py`	Inline comments; defines which CSV columns are exported into `listings_core` / `listings_sample` and how aggregates are built.
`scripts/property_annual_returns.py`	Docstring: sale dedupe rules, CAGR formula, and projection helpers.
`scripts/build_public_transport_data.py`	Module docstring: PTA source paths, Perth metro bounding box, and output file names.
`scripts/run_update.py`	Short entrypoint: order of builds and when public-transport generation runs.
PTA GeoJSON folders (if present)	Some downloads include `metadata.txt` beside the `.geojson` (supplier metadata).

Data: inputs (source files)

Path	Role
`perth_property_data.csv`	Primary input. Sold listings (one row per sale / listing as defined in your extract). Required at the repository root. The pipeline drops rows with missing, non-numeric, or < AUD 100,000 `Price` (`MIN_PRICE_AUD` in `scripts/build_dashboard_data.py`). Regenerate all `dashboard/data/*.json` after changing the CSV or this rule (`python scripts/run_update.py`).
`Stops_PTA_001_WA_GDA2020_Public_GeoJSON/Stops_PTA_001_WA_GDA2020_Public.geojson`	Optional. WA public transport stops (GDA2020). Used only if this folder and file exist when you run `run_update.py`.
`Service_Routes_PTA_002_WA_GDA2020_Public_GeoJSON/Service_Routes_PTA_002_WA_GDA2020_Public.geojson`	Optional. WA service route geometry. Same condition as stops.
`dashboard/data/suburb_boundaries.geojson`	Required for suburb polygon rendering. WA locality boundaries filtered to dashboard suburbs (generated from Geoscape WA Localities).
`dashboard/data/parks.geojson`	Optional. Metro-wide parks/green-area polygons for map overlay. Current project build uses OpenStreetMap extraction via Overpass.

School locations on the map are not a separate download: school_points_estimated.json is derived in the pipeline from listing fields (e.g. primary school name and coordinates aggregated from sales rows).

Other archives or layers you may keep in the repo (for example regional parks ZIPs or rail-corridor GeoJSON) are not read by the current dashboard scripts unless you extend the pipeline.

External source links currently used

WA Suburb/Locality Boundaries (Geoscape, CC BY 4.0): data.gov.au dataset page
GeoJSON/WFS endpoint listed by data.gov.au: resource page
OpenStreetMap (parks / green areas) via Overpass API: Overpass API, OSM Overpass docs

Data: generated outputs (`dashboard/data/`)

Created by scripts/build_dashboard_data.py unless noted otherwise. Every file below reflects the price-filtered dataset (numeric Price ≥ AUD 100,000 only).

File	Description
`summary.json`	Dataset-wide rollups: row count, column count, sale date range, price median / mean / P75 / P95.
`listings_core.json`	Per-row listing attributes (geocoded rows only), including price, dates, suburb, address, beds/baths, land size, distances, school name fields — used as the main fact table in the browser.
`listings_sample.json`	Random subset of listings (up to 8,000 rows) with the same column slice; useful for lighter experiments (not loaded by the current `app.js`).
`yearly.json`	Median price and sale counts by calendar year (aggregate).
`yearly_by_suburb.json`	Median price and counts by suburb × year.
`property_type_stats.json`	Counts and median price by property type.
`suburb_stats.json`	Suburb-level counts, median / average price, average distance to CBD.
`suburb_map_stats.json`	Suburb centroids (mean lat/lon), counts, and average / median price for map circles.
`school_points_estimated.json`	One point per primary school name with estimated coordinates (mean of listing locations) and counts — used by the dashboard map layer.
`suburb_boundaries.geojson`	Suburb/locality polygons used by the map choropleth/heat rendering for suburb-based metrics.
`parks.geojson`	Parks/green-area polygons shown in the optional `Parks` map layer.

Created by scripts/property_annual_returns.py (invoked from build_dashboard_data.py):

File	Description
`property_annual_return_intervals.json`	One row per consecutive sale pair on the same `house_key` (aligned with dashboard `houseKey` logic). Fields: `prev_date_sold`, `date_sold`, `prev_price`, `price`, `years` (elapsed time in fractional years, days ÷ 365.25), `annual_return`. Pairs with `years` < 1 are omitted (unstable CAGR on short holds). Intervals with CAGR > 100%/yr are omitted.
`property_annual_return_summary.json`	One row per property with latest sale price/date, `interval_count`, and `avg_annual_return` (mean of interval CAGR values). Includes `future_price_1y` = `current_price * (1 + avg_annual_return)` when the average is defined.

Deduplication before intervals: same calendar day + same price → one row; same day + different prices → keep max price only. Annualized return between sales: (price / prev_price) ** (1 / years) - 1, only when elapsed time is at least one year. For horizons of n years: future_price = current_price * (1 + avg_annual_return) ** n (not stored for every n in JSON; compute in the UI or extend the pipeline).

Created by scripts/build_public_transport_data.py when PTA inputs exist:

File	Description
`public_transport_stops.geojson`	Stops clipped / simplified for Greater Perth for browser use.
`public_transport_routes.geojson`	Route linework clipped / simplified for the same area.
`parks_osm_raw.json`	Raw Overpass response used to produce `parks.geojson` (if you choose to refresh parks data via script).

Data: what the dashboard loads in the browser

At startup, dashboard/app.js fetches:

File	Purpose
`summary.json`	Baseline KPI metadata combined with live-filtered rows.
`listings_core.json`	All charts, tables, map markers, and filters.
`property_annual_return_intervals.json`	Optional precomputed resale intervals; if present and non-empty, the dashboard uses the same suburb metric as in-browser logic. Suburb “Avg resale growth (%)” = mean across calendar years of the per-year mean CAGR (each interval: later sale year, same `house_key`, hold ≥ 1 year, CAGR ≤ 100%/yr, same-day dedupe; not a sum of simple price deltas).
`school_points_estimated.json`	School overlay on the map.
`public_transport_stops.geojson`	Optional; if missing or invalid, transport layers are skipped gracefully.
`public_transport_routes.geojson`	Optional; same as above.
`suburb_boundaries.geojson`	Suburb polygon boundaries for heat/choropleth map rendering.
`parks.geojson`	Optional parks/green-area overlay layer.

The other JSON files in dashboard/data/ are still produced by the pipeline for analysis, exports, or future UI — they are not requested by the current static page.

Requirements

Python 3.10+ (or compatible) with pandas installed (pip install pandas if needed).
A modern browser for the dashboard.

Regenerate dashboard data

From the repository root:

python scripts/run_update.py

This expects perth_property_data.csv at the project root. Rows with missing Price, price < AUD 100,000, or non‑numeric price are dropped before any aggregates are built. Re-run this command after changing the CSV or the price rule so every JSON under dashboard/data/ stays in sync (KPIs, map, tables, CAGR). It refreshes summary.json, listings_core.json, yearly.json, property_annual_return_intervals.json, property_annual_return_summary.json, and related files. If the script exits with an error, no row satisfied the price filter — check the data or MIN_PRICE_AUD in scripts/build_dashboard_data.py.

Without the CSV, you can still drop Price < 100k from an existing dashboard/data/listings_core.json and refresh summary.json counts with:

node scripts/sanitize_dashboard_data_min_price.js

If you have the PTA GeoJSON datasets in the expected folders (Stops_PTA_001_…, Service_Routes_PTA_002_…), the same command also rebuilds the simplified transport layers used by the map.

To refresh parks data from OpenStreetMap (metro Perth extract), run:

python scripts/fetch_parks_osm.py

This creates/updates dashboard/data/parks_osm_raw.json and is used to regenerate dashboard/data/parks.geojson in this project workflow.

Assets (branding)

Bundled under dashboard/assets/:

perth-property-logo.png — header wordmark.
header-banner.png — full-width header background.
favicon.png — browser tab / bookmark icon (rel="icon" and apple-touch-icon in index.html).

Licence and data

Listing data and third-party spatial extracts (PTA, parks, etc.) may be subject to their own terms. Use this project in line with those sources and applicable law.

_{Built for clear, filter-driven exploration of Perth property sales.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Perth Property Data

Overview

What is in the dashboard?

Repository layout

Documentation

Data: inputs (source files)

External source links currently used

Data: generated outputs (`dashboard/data/`)

Data: what the dashboard loads in the browser

Requirements

Regenerate dashboard data

Assets (branding)

Licence and data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
dashboard		dashboard
image/data_schema		image/data_schema
scripts		scripts
.gitignore		.gitignore
README.md		README.md
data_schema.md		data_schema.md
perth_property_data.csv		perth_property_data.csv

Folders and files

Latest commit

History

Repository files navigation

Perth Property Data

Overview

What is in the dashboard?

Repository layout

Documentation

Data: inputs (source files)

External source links currently used

Data: generated outputs (dashboard/data/)

Data: what the dashboard loads in the browser

Requirements

Regenerate dashboard data

Assets (branding)

Licence and data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Data: generated outputs (`dashboard/data/`)

Packages