Skip to content

moizzah/pinsml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PINS-ML: Exploring PINS Appeals Data with SPARQL → pandas

This repo is a small, reproducible workspace for discovering and profiling Planning Inspectorate (PINS) appeals data published via the Open Data Communities (ODC) SPARQL endpoint - [[https://opendatacommunities.org/sparql]], and shaping it into notebook-friendly pandas tables.

What this project does

  • Connects to ODC’s SPARQL endpoint and runs a set of reusable queries against the PINS graph.
  • Builds a compact data profile of the graph:
    • total rows (triples),
    • unique triples,
    • unique appeals (defined pragmatically as “subjects having …/pins-appeals/CaseRef”),
    • a candidate unique ID predicate (key-like; typically CaseRef),
    • date ranges across the whole graph and for domain dates (e.g., DeciDate),
    • sample triple fetches (paged).
  • Exposes the results as pandas DataFrames and saves key summaries under data/ for reuse.

Repository structure

├── notebooks/
│ ├── PINS-ML_SPARQL_to_Pandas.ipynb # main analysis & queries
│ └── fallback_investigation.ipynb # connectivity diagnostics & fixes
├── data/
│ ├── pins_predicate_counts.csv # export: predicate → count (+pretty columns)
│ └── pins_predicate_counts.parquet # same as parquet
└── README.md

Why these folders exist

  • notebooks/ — the working analysis. Everything runs through a single function run_sparql(query, endpoint=ENDPOINT, ...) -> pd.DataFrame and the predefined PREFIXES and GRAPH_IRI. No hidden helpers are required to reproduce the results.
  • data/ — lightweight derived artifacts (e.g., predicate counts) generated by the notebooks so you can view outputs between runs, share small tables, and avoid re-querying when not needed.

Notebooks

1) PINS-ML_SPARQL_to_Pandas.ipynb (core analysis)

End-to-end, notebook-ready queries that all go through your run_sparql():

  • Graph profiling
    • Total rows (triples)
    • Unique triples (distinct ?s ?p ?o)
    • Unique appeals via CaseRef
  • Predicate discovery
    • Predicate frequency table (?p, count)
    • Pretty columns: URI, Namespace, Predicate (derived from URI tail)
  • Key candidate (unique ID)
    • Finds predicates that are single-valued per subject and globally unique across appeals
  • Date ranges
    • Whole-graph min/max for any xsd:date or xsd:dateTime
    • Domain date ranges (e.g., …/pins-appeals/DeciDate)
    • Per-predicate date coverage (how widely each date appears across appeals)
  • Triple sampling
    • Fetch N triples (paged, stable ordering)
    • Fetch all triples for one appeal (deterministic pick or specific CaseRef)

2) fallback_investigation.ipynb (diagnostics)

  • SPARQLWrapper vs requests fallback behaviour
  • TLS/CA bundle fix (why requests worked while SPARQLWrapper failed, and how we aligned them)
  • Lightweight probes (GET vs POST) to confirm transport and headers

Notebook constants (already in notebooks)

ENDPOINT – ODC SPARQL endpoint

GRAPH_IRI – PINS graph IRI

PREFIXES – exact set used throughout:

run_sparql(query, endpoint=ENDPOINT, timeout=60, verbose=True) -> pd.DataFrame (always returns a DataFrame; tries SPARQLWrapper, falls back to requests)

Notes & acknowledgements

  • Data originates from Open Data Communities (ODC) under their published terms.

  • This repo contains queries and lightweight derived tables, not bulk data dumps.

  • Thanks to the ODC team for making PINS data available via SPARQL.

This is the end of phase 1.

About

Planning Inspectorate Data Playbook

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors