Discussion: Versions should use hashes

Currently in hyperdrive, hypercore, beaker browser _(and probably at a few other tools)_ versions are specified as `length of the append-log` _(a number)_. However, that is not a safe specification of a version.

Problem case: a researcher wants to specify exactly which version of a DAT is used, and specifies it like `dat://ab...ef+234`. The researcher notices that the data-set doesn't fit the output, reverts to version 1 and creates a new DAT with exactly 234 versions to fit the output. With this the researcher just managed to specify false claims.

How to make sure this never happens? Each version of a hyper**core** creates a hash.
Which makes one version of a hyper**drive** a combinations of various hypercore versions.

Specifying a dat version like this though:

```
dat://<channel:64-hex-chars>+<metadata:64-hex-chars>+<content:64-hex-chars>
```

... for a single-writer-dat. Which would become even more of a hassle with a
multi-writer-dat _(1 key for the channel + 2 hashes per writer)_. _Note: I know that it could be okay to have only the first 8 characters as version identification, but that would probably not be good enough for a researcher._

Thinking about this for a little, I got following solution which might be a good idea for a new DEP:

_(Single-writer for the sake of simplicity)_

We could add another `version` hypercore to a hyperdrive, that keeps an index of the versions and hashes:

```protobuf
{
  string hash = 0; // Hash of the version (calculated by hashing all hashes in here)
  repeated string tags = 1; // Names to find this version by
  int32 metadataLength = 2; // Length of the metadata-core
  string metadataHash = 3; // Hash for the version on the metadata-core
  int32 contentLength = 4; // Length of the content-core
  string contentHash = 5; // Hash for the version of the content-core
}
```

This way a version checkout could download all versions of the version hypercore, create a lookup-table and select the version based on that lookup-table. 

My questions now are:

- Is this a reasonable approach? Do you know a better way to get that done?
- How could a multi-writer version look like? 
- Should this be turned into a DEP?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Versions should use hashes #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: Versions should use hashes #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions