Currently in hyperdrive, hypercore, beaker browser (and probably at a few other tools) versions are specified as length of the append-log (a number). However, that is not a safe specification of a version.
Problem case: a researcher wants to specify exactly which version of a DAT is used, and specifies it like dat://ab...ef+234. The researcher notices that the data-set doesn't fit the output, reverts to version 1 and creates a new DAT with exactly 234 versions to fit the output. With this the researcher just managed to specify false claims.
How to make sure this never happens? Each version of a hypercore creates a hash.
Which makes one version of a hyperdrive a combinations of various hypercore versions.
Specifying a dat version like this though:
dat://<channel:64-hex-chars>+<metadata:64-hex-chars>+<content:64-hex-chars>
... for a single-writer-dat. Which would become even more of a hassle with a
multi-writer-dat (1 key for the channel + 2 hashes per writer). Note: I know that it could be okay to have only the first 8 characters as version identification, but that would probably not be good enough for a researcher.
Thinking about this for a little, I got following solution which might be a good idea for a new DEP:
(Single-writer for the sake of simplicity)
We could add another version hypercore to a hyperdrive, that keeps an index of the versions and hashes:
{
string hash = 0; // Hash of the version (calculated by hashing all hashes in here)
repeated string tags = 1; // Names to find this version by
int32 metadataLength = 2; // Length of the metadata-core
string metadataHash = 3; // Hash for the version on the metadata-core
int32 contentLength = 4; // Length of the content-core
string contentHash = 5; // Hash for the version of the content-core
}
This way a version checkout could download all versions of the version hypercore, create a lookup-table and select the version based on that lookup-table.
My questions now are:
- Is this a reasonable approach? Do you know a better way to get that done?
- How could a multi-writer version look like?
- Should this be turned into a DEP?
Currently in hyperdrive, hypercore, beaker browser (and probably at a few other tools) versions are specified as
length of the append-log(a number). However, that is not a safe specification of a version.Problem case: a researcher wants to specify exactly which version of a DAT is used, and specifies it like
dat://ab...ef+234. The researcher notices that the data-set doesn't fit the output, reverts to version 1 and creates a new DAT with exactly 234 versions to fit the output. With this the researcher just managed to specify false claims.How to make sure this never happens? Each version of a hypercore creates a hash.
Which makes one version of a hyperdrive a combinations of various hypercore versions.
Specifying a dat version like this though:
... for a single-writer-dat. Which would become even more of a hassle with a
multi-writer-dat (1 key for the channel + 2 hashes per writer). Note: I know that it could be okay to have only the first 8 characters as version identification, but that would probably not be good enough for a researcher.
Thinking about this for a little, I got following solution which might be a good idea for a new DEP:
(Single-writer for the sake of simplicity)
We could add another
versionhypercore to a hyperdrive, that keeps an index of the versions and hashes:{ string hash = 0; // Hash of the version (calculated by hashing all hashes in here) repeated string tags = 1; // Names to find this version by int32 metadataLength = 2; // Length of the metadata-core string metadataHash = 3; // Hash for the version on the metadata-core int32 contentLength = 4; // Length of the content-core string contentHash = 5; // Hash for the version of the content-core }This way a version checkout could download all versions of the version hypercore, create a lookup-table and select the version based on that lookup-table.
My questions now are: