This repo is a small, reproducible workspace for discovering and profiling Planning Inspectorate (PINS) appeals data published via the Open Data Communities (ODC) SPARQL endpoint - [[https://opendatacommunities.org/sparql]], and shaping it into notebook-friendly pandas tables.
- Connects to ODC’s SPARQL endpoint and runs a set of reusable queries against the PINS graph.
- Builds a compact data profile of the graph:
- total rows (triples),
- unique triples,
- unique appeals (defined pragmatically as “subjects having
…/pins-appeals/CaseRef”), - a candidate unique ID predicate (key-like; typically
CaseRef), - date ranges across the whole graph and for domain dates (e.g.,
DeciDate), - sample triple fetches (paged).
- Exposes the results as pandas DataFrames and saves key summaries under
data/for reuse.
├── notebooks/
│ ├── PINS-ML_SPARQL_to_Pandas.ipynb # main analysis & queries
│ └── fallback_investigation.ipynb # connectivity diagnostics & fixes
├── data/
│ ├── pins_predicate_counts.csv # export: predicate → count (+pretty columns)
│ └── pins_predicate_counts.parquet # same as parquet
└── README.md
notebooks/— the working analysis. Everything runs through a single functionrun_sparql(query, endpoint=ENDPOINT, ...) -> pd.DataFrameand the predefinedPREFIXESandGRAPH_IRI. No hidden helpers are required to reproduce the results.data/— lightweight derived artifacts (e.g., predicate counts) generated by the notebooks so you can view outputs between runs, share small tables, and avoid re-querying when not needed.
End-to-end, notebook-ready queries that all go through your run_sparql():
- Graph profiling
- Total rows (triples)
- Unique triples (distinct
?s ?p ?o) - Unique appeals via
CaseRef
- Predicate discovery
- Predicate frequency table (
?p,count) - Pretty columns: URI, Namespace, Predicate (derived from URI tail)
- Predicate frequency table (
- Key candidate (unique ID)
- Finds predicates that are single-valued per subject and globally unique across appeals
- Date ranges
- Whole-graph min/max for any
xsd:dateorxsd:dateTime - Domain date ranges (e.g.,
…/pins-appeals/DeciDate) - Per-predicate date coverage (how widely each date appears across appeals)
- Whole-graph min/max for any
- Triple sampling
- Fetch N triples (paged, stable ordering)
- Fetch all triples for one appeal (deterministic pick or specific CaseRef)
- SPARQLWrapper vs requests fallback behaviour
- TLS/CA bundle fix (why
requestsworked whileSPARQLWrapperfailed, and how we aligned them) - Lightweight probes (GET vs POST) to confirm transport and headers
ENDPOINT – ODC SPARQL endpoint
GRAPH_IRI – PINS graph IRI
PREFIXES – exact set used throughout:
- PREFIX pins: http://opendatacommunities.org/def/ontology/planning/pins/
- PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
- PREFIX xsd: http://www.w3.org/2001/XMLSchema#
- PREFIX dct: http://purl.org/dc/terms/
run_sparql(query, endpoint=ENDPOINT, timeout=60, verbose=True) -> pd.DataFrame (always returns a DataFrame; tries SPARQLWrapper, falls back to requests)
-
Data originates from Open Data Communities (ODC) under their published terms.
-
This repo contains queries and lightweight derived tables, not bulk data dumps.
-
Thanks to the ODC team for making PINS data available via SPARQL.
This is the end of phase 1.