This repository provides tools to discover, consolidate, and visualize metadata from the open data STAC catalogs of major commercial SAR providers: Capella Space, ICEYE, and Umbra.
The primary goal is to create a harmonized GeoDataFrame for each provider, which is then saved in GeoParquet format. The entire process is automated to run weekly via GitHub Actions, ensuring the datasets remain up-to-date.
NOTE: I believe that Synspective only provides open data upon request as of October 15, 2025 https://synspective.com/gallery/
Inspired by @scottyhq's stac2geojson
Optimized for browser-based visualization with stac-map:
- Datetime fields parsed to
pd.Timestampfor temporal sliders - Bbox stored as a nested dict for spatial queries
- Assets compacted to essential fields (href, type, roles)
- GeoJSON geometry serialized for JavaScript compatibility
- Links resolved to absolute URLs
Note that Capella already has a great interactive web map for its open data https://felt.com/map/Capella-Space-Open-Data-bB24xsH3SuiUlpMdDbVRaA?loc=0,-20.5,1.83z and users should refer to this while it's still maintained.
Development Seed provides a great open-source tool called stac-map for visualizing these derived geoparquets -- all you need is the GitHub endpoint to the raw geoparquet file of interest. This should match a structure similar to:
Below are hyperlinks to access the respective parquets on this repo:
- ICEYE: All ICEYE open data samples
- Umbra: All Umbra open data samples
- Capella: CPHD | CSI | GEC | GEO | SICD | SIDD | SLC
Optimized for programmatic analysis:
- Asset hrefs expanded as individual columns (e.g.,
asset_thumbnail,asset_overview) - Full STAC properties preserved
- Minimal transformations (e.g. serializing cols with mixed dtypes) for easier filtering/analysis
You can load any of the published GeoParquet files directly into Python using GeoPandas without downloading them first. Simply pass the raw GitHub URL to gpd.read_file():
import geopandas as gpd
# Example: Load Capella CPHD ARD parquet directly from GitHub
url = "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/capella/capella_CPHD.parquet"
gdf = gpd.read_file(url)This works for any of the Parquet files; just replace the URL with the desired dataset.
NOTE: It is important to use the 'ARD' Parquet files for Python streaming and local GIS software, as they are serialized specifically for programmatic use, as opposed to the 'VIZ' files.
You can download the latest generated Parquet files directly using command-line tools like curl (for Linux/macOS) or Invoke-WebRequest (for Windows PowerShell)
# Download 'ARD' format (for analysis)
curl -L -o iceye_ard.parquet "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/iceye/iceye.parquet"
curl -L -o umbra_ard.parquet "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/umbra/umbra.parquet"
curl -L -o capella_GEC_ard.parquet "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/capella/capella_GEC.parquet"# Download 'ARD' format (for analysis)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/iceye/iceye.parquet" -OutFile "iceye_ard.parquet"
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/umbra/umbra.parquet" -OutFile "umbra_ard.parquet"
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Jack-Hayes/commerical-sar-stac/main/parquets/ard/capella/capella_GEC.parquet" -OutFile "capella_GEC_ard.parquet"This repository contains open-source code for accessing and processing sample datasets provided by commercial companies including Capella Space, Umbra, and ICEYE.
All datasets and APIs are governed by their respective providers' terms of use. This repository does not redistribute or claim ownership of any proprietary or commercial data.
Users are responsible for ensuring their use of data and APIs complies with the terms set by:
- Capella Space: https://www.capellaspace.com/legal/
- Umbra: https://umbra.space/legal/
- ICEYE: https://www.iceye.com/sar-data/api
The ingestion process follows cloud-optimized best practices:
- Discovery: For nested catalogs (Umbra, Capella), the script uses
s3fsor recursiveaiohttpcalls to efficiently discover all STAC Item URLs. For flat catalogs (ICEYE), it directly parses the collection. - Fetching: All STAC Item JSON files are fetched concurrently using
aiohttpfor high performance. - Processing: The raw JSONs are parsed into a uniform, flattened structure in-memory using Pandas. This includes extracting asset URLs and ensuring correct geometry representation with Shapely.
- Creation: A GeoDataFrame is created from the processed records.
- Storage: The final, cleaned GeoDataFrame for each provider (or product type) is saved as a GeoParquet file.
.github/workflows/: Contains GitHub Actions for CI (testing, linting) and weekly data updates.parquets/: Stores the output GeoParquet files, organized by format (/vizor/ard) and provider.scripts/: The main Python source code for data ingestion and processing.tests/:pytesttests to validate endpoints and data structures.environment.yml: The Conda environment file to ensure reproducibility.
-
Clone the repository:
git clone https://github.com/Jack-Hayes/commerical-sar-stac.git cd commerical-sar-stac -
Create and activate the Conda environment:
mamba env create -f environment.yml mamba activate commercial-sar
-
Run the script: You can process specific providers by passing their names as command-line arguments.
# Process all providers in both formats (default) python -m scripts.main capella iceye umbra # Process only VIZ format python -m scripts.main capella iceye umbra --format viz # Process only ARD format python -m scripts.main capella iceye umbra --format ard # Process specific providers python -m scripts.main capella iceye
Warning: This tool is under active development and currently supports Capella datasets only.
The tools/get_kmz.py CLI tool exports a KMZ that visualizes acquisition geometry for a single STAC item taken from the local repository Parquet files. Given a provider, item id, and product dtype (SLC, GEO, CPHD, etc.), the tool points to the matching local Parquet (/ard) file, reads extended metadata for the item, and emits a KMZ with:
- a satellite track built from state vectors,
- look vectors (rays) drawn from every Nth state vector to the image using satellite attitude quaternions
- a thumbnail overlaid (draped) on the ground so the acquisition footprint and look geometry can be inspected in Google Earth or Google Earth Engine, and
- a popup table showing basic STAC fields plus waveform / sampling / pointing metadata (i.e. sampling frequency, PRF, pulse bandwidth, pulse duration, beamwidths, range/azimuth/ground resolutions, NESZ, and other available image geometry fields).
- Currently, the tool determines the input Parquet by mapping the supplied inputs to a local ARD file path:
parquets/ard/capella/capella_<DTYPE>.parquet(DTYPE is the--dtypeargument). - The CLI expects the ARD file to contain a row with
idmatching the supplied--id. The script reads the row, resolvesasset_metadata(STAC item JSON) andasset_thumbnail(thumbnail) from the row, fetches metadata, and generates the KMZ. - The KMZ contains
doc.kmland, if available,preview.pnginside the KMZ archive.
- You must have the parquety files from this repo downloaded locally.
- Dependencies to build KMZ (this will hopefully be fixed soon with an env handler, note that version pins aren't absolute):
pyproj==3.7.2(for ECEF->LLA transforms)simplekml==1.3.2(for KML/KMZ creation)scipy==1.16.3
- If these optional packages are not installed, the CLI will exit with an actionable message explaining how to install them.
Run from the project root. This example writes the KMZ to your Desktop.
python -m tools.get_kmz --provider capella \
--id CAPELLA_C13_SP_SLC_HH_20251220124212_20251220124224 \
--dtype SLC \
--output-dir /tmpYou can view full CLI options with:
python -m tools.get_kmz -hContributions are welcome!
This repository follows standard GitHub workflows with a protected main branch.
- Fork this repository to your own GitHub account.
- Create a feature branch from
mainin your fork (for example,feature/my-improvement). - Commit your changes using clear, signed commits.
- Open a Pull Request (PR) against the
mainbranch of this repository.
All pull requests:
- Must pass automated checks and code quality scans.
- Require at least one review approval (by a repository admin, me 😃).
- Cannot be force-pushed or merged directly into
main.
Once reviewed and approved, your PR will be merged following a linear history (no merge commits).
This project is released under the MIT License
By contributing, you agree that your contributions will be licensed under the same terms.