Skip to content

Hash Large Files Efficiently [Nice to have] #2

@katelynsills

Description

@katelynsills

Some of the files that Starling archives are >65GB videos. Getting a CID through the usual mechanism for these large files is likely very slow. We may want to change the CID settings to use Blake3 rather than SHA256 on the UWAZI side, so that the end user can get a CID more quickly and the user experience improves.

On the Starling ingestion side, this would mean that Starling would need to produce the usual SHA256 CIDv1 as well as the Blake3 version, and put both in the Starling Hyperbee. It would be something like the following, where the brackets are replaced with the actual hashes:

key: [Blake3CID]/SHA256CID
value: [SHA256CID]

The attribute values for the entity will remain keyed by the usual SHA256CID.

This would require two lookups to get the usual metadata, and it also requires the user to trust Starling to have associated the Blake3 hash with the SHA256 hash correctly. However, for a casual user who simply wants to view the metadata, this is likely the most efficient.

Methods

IPFS Kubo and the js-multiformats library have an option to use hash functions other than SHA256.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions