Skip to content

ls performance is poor (though what we probably really care about is a du version) #4

@bnlawrence

Description

@bnlawrence

Here's an example using NetCDF, we'll be right screwed when/if we support Zarr2 ...

You have entered a lightweight management tool for organising "files" inside an S3 object store
Your available minio locations are: local play s3 hpos hrs3 cedadev
Choose one with "loc x" 
s3> loc hrs3
Buckets:  hrcm
hrs3> cb hrcm
Bucket: hrcm contains 73.2TiB in 11889 files/objects.
hrs3> ls
Location:  contains 73.2TiB in 11889 files/objects.
This directory contains 11889 files and 0 directories.

at which point you might as well go and get a coffee, have a snooze etc.

What's actually happening in the back of this is we have to do 11000 queries of the database to get object properties. This is of course a "feature" of object stores, and we don't really have any mechanism to avoid it other than cache information in the client so at least subsequent queries are faster if the storage is not changing, which is probably fair in an archive environment (realistically we should be encouraging workflows where S3/Object Store is used for finished products for sharing, not interim products).

On the other hand, we can get properties very (ok, relatively) quickly, but that depends on adding the metadata at creation time. We could put volume properties in on zarr indices!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions