I use GDAL to get shapefiles into BigQuery in WKT format so I can make maps in Looker Studio. The uses for this library are as broad as your imagination. Here is what you'll need to do to get going.
We're going to use GDALwhich is a translator library for geospatial data formats. This will allow us to start with basically any type of geospatial file format, but I'll use shapefiles in this demo.
We need to create a virtual environment to isolate dependencies.
cmarikos@MacBookAir ~ % python3 -m venv geoenvThen we'll activate the virtual environment with the following command
cmarikos@MacBookAir ~ % source geoenv/bin/activateWe will see the name of our virtual environment in our terminal when this is successfully activated like so
(geoenv) cmarikos@MacBookAir ~ %Next
(geoenv) cmarikos@MacBookAir ~ % pip install GDALInstall GDAL in your virtual environment with the following command
(geoenv) cmarikos@MacBookAir ~ % pip install gdalActivate your virtual environment in your IDE with this command in the terminal:
geoenvcmarikos@MacBookAir geoenv % source bin/activateWhen it's been successfully activated your terminal will print the following:
(geoenv) geoenvcmarikos@MacBookAir geoenv % Make sure GDAL is installed correctly:
# print the following error if GDAL is not installed correctly
try:
from osgeo import gdal
except ImportError:
print("Error: GDAL library is not installed or not found in your PYTHONPATH.")
exit(1)
# Print the GDAL version information
print("GDAL Version:", gdal.VersionInfo())
# List available GDAL drivers
driver_count = gdal.GetDriverCount()
print("Number of GDAL Drivers installed:", driver_count)
for i in range(driver_count):
driver = gdal.GetDriver(i)
print(f"{i}: {driver.ShortName} - {driver.LongName}")Now that our virutal environment is setup and we have GDAL installed correctly, we can convert a shapefile. Add a shapefile to your project folder.
I'm adding a file called 2024 AZ Voting Precincts v2.shp (I have added the .cpg, .dbf, .prj, .qmd, .shp, and .shx files)
Now that we have our files in the correct folder we can employ the GDAL shapefile conversion function to create a WKT field in a CSV
Now in your terminal run the following:
ogr2ogr -f "CSV" az_precincts.csv "2024 AZ Voting Precincts v2.shp" -lco GEOMETRY=AS_WKTThis will create a CSV file called az_precincts.csv that contains column called WKT with geospatial data that we can upload to BigQuery.
Now that we have our az_precincts.csv we need to upload them to BigQuery. Review how to upload CSV files to BigQuery in this guide.
Now that our file is loaded to BigQuery we have to recategorize your WKT data type as Geospatial data. I do this by creating a view with the ST_GEOGFROMTEXT() function applied to my WKT column, there's probably also a way to do this when you are loading the CSV. Refer to the guide above if you want to dig up a solution.
Your column containing Geospatial data must be named "GEOMETRY" for Looker to accept it as Geospatial data
Here is a code sample of my view
CREATE OR REPLACE VIEW `prod-organize-arizon-4e1c0a83.geofiles.az_precincts_geo AS(
SELECT
ST_GEOGFROMTEXT(WKT) AS GEOMETRY -- you must name it "GEOMETRY" for Looker to categorize it correctly as Geospatial data
, COUNTY
, PCTNUM --this is a field I have created with a python script so I can have an exact match by precinct since AZ precinct names are inconsistent accross VAN/Voterfiles
, PRECINCTNA -- human readable precinct name
FROM `prod-organize-arizon-4e1c0a83.geofiles.az_precincts`
)Looker doesn't join map data well so we have to join columns we want within BigQuery. There is some weirdness to be aware of. Our GEOMETRY column data type can't be grouped. In order to create map features that utilize aggregation, you'll need to create a CTE then join to the table creating geometry. I'll show my code sample below (my columns are slightly different, since I am working with a different precinct file).
CREATE OR REPLACE VIEW `prod-organize-arizon-4e1c0a83.rich_christina_proj.sr_by_pctnum_c4_2024` AS (
WITH a AS(
SELECT
mc.pctnum
, sq.SurveyQuestionName
, sr.SurveyResponseName
, COUNT(DISTINCT cs.VanID) as response_count
FROM `prod-organize-arizon-4e1c0a83.ngpvan.CTARAA_ContactsSurveyResponses_VF` AS cs
LEFT JOIN `prod-organize-arizon-4e1c0a83.ngpvan.CTARAA_SurveyQuestions` AS sq
ON cs.SurveyQuestionID = sq.SurveyQuestionID
LEFT JOIN `prod-organize-arizon-4e1c0a83.ngpvan.CTARAA_SurveyResponses` AS sr
ON cs.SurveyResponseID = sr.SurveyResponseID
FULL OUTER JOIN `prod-organize-arizon-4e1c0a83.rich_christina_proj.modified_c4_precincts_2024` as mc
ON cs.VanID = mc.VanID
WHERE sq.Cycle ='2024'
AND mc.countyname IN ('COCONINO','COCHISE','MOHAVE','PINAL','PIMA','YAVAPAI','YUMA')
GROUP BY 1,2,3
ORDER BY 1,2,3
)
SELECT
pg.GEOMETRY
, pg.COUNTY
, pg.PRECINCTNA
, pg.LEGISLATIV
, pg.CONGRESSIO
, a.SurveyQuestionName
, a.SurveyResponseName
, a.response_count
FROM `prod-organize-arizon-4e1c0a83.geofiles.az_precincts_geo` AS pg
LEFT JOIN a
ON pg.PCTNUM = a.pctnum
WHERE a.response_count IS NOT NULL
)