Skip to content

Discussion: Versions should use hashes #54

@martinheidegger

Description

@martinheidegger

Currently in hyperdrive, hypercore, beaker browser (and probably at a few other tools) versions are specified as length of the append-log (a number). However, that is not a safe specification of a version.

Problem case: a researcher wants to specify exactly which version of a DAT is used, and specifies it like dat://ab...ef+234. The researcher notices that the data-set doesn't fit the output, reverts to version 1 and creates a new DAT with exactly 234 versions to fit the output. With this the researcher just managed to specify false claims.

How to make sure this never happens? Each version of a hypercore creates a hash.
Which makes one version of a hyperdrive a combinations of various hypercore versions.

Specifying a dat version like this though:

dat://<channel:64-hex-chars>+<metadata:64-hex-chars>+<content:64-hex-chars>

... for a single-writer-dat. Which would become even more of a hassle with a
multi-writer-dat (1 key for the channel + 2 hashes per writer). Note: I know that it could be okay to have only the first 8 characters as version identification, but that would probably not be good enough for a researcher.

Thinking about this for a little, I got following solution which might be a good idea for a new DEP:

(Single-writer for the sake of simplicity)

We could add another version hypercore to a hyperdrive, that keeps an index of the versions and hashes:

{
  string hash = 0; // Hash of the version (calculated by hashing all hashes in here)
  repeated string tags = 1; // Names to find this version by
  int32 metadataLength = 2; // Length of the metadata-core
  string metadataHash = 3; // Hash for the version on the metadata-core
  int32 contentLength = 4; // Length of the content-core
  string contentHash = 5; // Hash for the version of the content-core
}

This way a version checkout could download all versions of the version hypercore, create a lookup-table and select the version based on that lookup-table.

My questions now are:

  • Is this a reasonable approach? Do you know a better way to get that done?
  • How could a multi-writer version look like?
  • Should this be turned into a DEP?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions