`get_catchment_characteristics` should batch s3 requests

The implementation of `get_catchment_characteristics` opens and collects the s3 data source once for each column. This means the number of s3 data source `open_dataset`/`collect` calls increases linearly with the number of variables/characteristics, even if multiple can be pulled from the same s3 bucket at the same time.

Below is a performance comparison for 1, 5, and 9 variable pulls, all for the same 20 catchment IDs. My implementation batches and pulls the variables based on s3 source (retrieves multiple variables where possible rather than one at a time).

![Image](https://github.com/user-attachments/assets/c7060515-12dd-4341-9f07-1eff7525c3d3)

The top pane (single url) shows the performance difference when all variables can be pulled from the same s3 url - the batched execution time stays relatively constant with respect to number of variables, while the current function increases from ~2 to ~20 s.

The bottom pane (multi url) shows that the functions perform the same when each variable comes from a unique s3 buckets (no batching possible), meaning the performance difference in the top pane is only really due to the batching.

Batching variable pulls decreases the number of open and collect calls performed on the s3 data source, which is good for s3 storage request count and for the user.

I already have my own implementation that I can share and am happy to submit a PR for this, although I would need more info on the application of the percent missing column though, as in the >300 variables I have pulled, I have yet to see that column come out of the s3 storage (and it looks like it's overwritten in the original function, anyways?). 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`get_catchment_characteristics` should batch s3 requests #449

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

get_catchment_characteristics should batch s3 requests #449

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`get_catchment_characteristics` should batch s3 requests #449