Skip to content

Large Scale Geospatial Benchmarks #1543

@jrbourbeau

Description

@jrbourbeau

People love the Xarray/Dask/... software stack for geospatial workloads, but only up to about the terabyte scale. At the terabyte scale this stack can struggle, requiring expertise to work well and frustrating users and developers alike.

To address this, we want to build a large-scale geospatial benchmark suite of end-to-end examples to ensure that these tools operate smoothly up to the 100-TB scale.

We want your help to build a catalog of large scale, end-to-end, representative benchmarks. What does this help look like? We can use:

  • Ideas of what are the most common workflows in this space like:
    "People often need to take netCDF files that are generated hourly, rechunk them to be arranged spatially, and then update that dataset every day".
  • Real datasets to work on. We want to work with real data, not fake data.
  • Real code that does it. We don’t know the space well enough to write like a user here.

This is a big ask, we know, but we hope that if a few people can contribute something meaningfully then we’ll be able to contribute code changes that accelerate those workflows (and others) considerably.

We’d welcome contributions as comments on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions