Large Scale Geospatial Benchmarks

People love the Xarray/Dask/... software stack for geospatial workloads, but only up to about the terabyte scale. At the terabyte scale this stack can struggle, requiring expertise to work well and frustrating users and developers alike.

To address this, we want to build a large-scale geospatial benchmark suite of end-to-end examples to ensure that these tools operate smoothly up to the 100-TB scale.

We want your help to build a catalog of large scale, end-to-end, representative benchmarks. What does this help look like? We can use:

- Ideas of what are the most common workflows in this space like:
  "People often need to take netCDF files that are generated hourly, rechunk them to be arranged spatially, and then update that dataset every day".
- Real datasets to work on. We want to work with real data, not fake data.
- Real code that does it. We don’t know the space well enough to write like a user here.

This is a big ask, we know, but we hope that if a few people can contribute something meaningfully then we’ll be able to contribute code changes that accelerate those workflows (and others) considerably.

We’d welcome contributions as comments on this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Scale Geospatial Benchmarks #1543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large Scale Geospatial Benchmarks #1543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions