Skip to content
Jakob Borg edited this page Jul 12, 2016 · 1 revision

Purpose

Faster and more light weight connection establishment by not sending index information already present on the remote device.

Mechanism

Each index item (FileInfo) has a field int64 local_version. This local version field contains the value of a global or per folder counter, as it was when the item was last changed. The counter is strictly increasing, meaning that every time an index items is changed it is assigned a local version higher than any other index item in the same folder. Index entries from remote devices are never changed.

Each folder gains a new field uint64 index_id. This is set to a random non zero value when the index is initialized (i.e., folder added or database wiped). The zero value is reserved for compatibility with older versions.

It's important that the index_id has the same life cycle as the index data. That is, if the index is wiped a new index ID must be generated. Hence the index ID should be stored in the index database, not in the configuration or another place.

The combination of index ID and highest (max) local version uniquely identifies the index contents. The index contents can't change without either the max local version increasing (an item was altered) or the index ID changing (the database was reset).

The last seen index ID and max local version for each folder is sent as part of the Cluster Config exchange.

If the index ID sent by the remote device matches our index ID for a given folder, it is sufficient to only send index entries with higher local version numbers than the other device's max local version. A full index is sent whenever the index ID doesn't match.

To ensure index consistency in case of a disconnection during an index exchange, index entries must be sent in order of increasing local version. Receiving an index update with an entry with a local version number lower or equal to an existing local version number indicates a protocol error and the connection should be closed.

We should have a panic("bug: ...") on this, but must relax the requirement for devices not implementing delta indexes as they will be sending unsorted entries. We can key that on the index ID being zero.

Protocol Changes

The distinction between a full index transmission and a delta index transmission is only in the message type of the first index related message. An Index message means "drop all old index data and replace with the following", while an IndexUpdate means "add the following information to the index". A delta index will simply begin with an IndexUpdate message instead of an Index message. This is accepted already today, although the results are unpredictable...

In the ClusterConfig message, The index_id field is added, with field number 8, to the Device sub-message:

message ClusterConfig {
    repeated Folder folders = 1;
}

message Folder {
    string id                   = 1;
    string label                = 2;
    bool   read_only            = 3;
    bool   ignore_permissions   = 4;
    bool   ignore_delete        = 5;
    bool   disable_temp_indexes = 6;

    repeated Device devices = 16;
}

message Device {
    bytes           id                = 1;
    string          name              = 2;
    repeated string addresses         = 3;
    Compression     compression       = 4;
    string          cert_name         = 5;
    int64           max_local_version = 6;
    bool            introducer        = 7;
    uint64          index_id          = 8;  // added
}

For the local device ID it indicates our current index ID for the given folder. For another device ID it indicates the last index ID we saw from them. The device must therefore store the index ID received from each remote device in addition to the actual index data.

Compatibility

Devices not implementing this proposal will not send the index_id field . The other side will therefore always detect it as a mismatch (zero not being a valid index ID) and send the full index information, thus retaining compatibility.

Likewise, in the other direction, the newer device will receive a full index set when it may have expected an incremental update only. This is allowed, however, and will be handled just like it is today.

Implementation Notes

Local Version Counter

We can implement this as a globally increasing counter, a per folder increasing counter, or a counter based on the current time in UNIX nanoseconds. As long as it's always increasing it doesn't matter.

Regardless of which we choose we must scan the index on startup to figure out the current highest value to start from (as even clocks can move backwards), or perhaps store that separately in the database to avoid the scan. The easiest from there would be to just keep a incrementing counter per folder, i.e., what we do currently. That also gives more compact integers after serialization than the larger values we'd get by using time.UnixNano() for example.

Index Sorting

Index entries must be sent sorted by local version. That's not how we get them from the database though so a separate sorting step must be performed. In normal operation, on an established connection, index updates are small and can trivially be sorted in memory as just a slice of FileInfos. For the initial index exchange the amount of data may be large and sorting may need to happen on disk. We would typically do this by creating a temporary index database indexed by local version ID, then iterating over that. Technically we'll need an indexSorter type that starts out in memory for efficiency and spills to a temporary database at some set threshold.

Clone this wiki locally