-
Notifications
You must be signed in to change notification settings - Fork 292
Description
I have an issue whereby successive (full) syncs cause the same chat object to be downloaded multiple times. The compressed objects are all different, yet the plaintext contents are the same:
$ cd gmvault-db/db/chats/
$ md5sum */1437017132845624821.eml.gz
93db2cdf56f65c3393f48a7ac3822a89 subchats-2/1437017132845624821.eml.gz
7d891df7672a2346741de39773dc9810 subchats-3/1437017132845624821.eml.gz
cc2dac40e35bdfbcd06db0e6785d1f77 subchats-4/1437017132845624821.eml.gz
$ cp subchats-2/1437017132845624821.eml.gz /tmp/sc2.gz
$ cp subchats-3/1437017132845624821.eml.gz /tmp/sc3.gz
$ cp subchats-4/1437017132845624821.eml.gz /tmp/sc4.gz
$ gunzip /tmp/sc2.gz
$ gunzip /tmp/sc3.gz
$ gunzip /tmp/sc4.gz
$ md5sum /tmp/sc*
8f96d8ec223ea64c13a028cc9038a694 /tmp/sc2
8f96d8ec223ea64c13a028cc9038a694 /tmp/sc3
8f96d8ec223ea64c13a028cc9038a694 /tmp/sc4
These duplicates are not created every time. Generally when there is nothing to update (no new emails or chats) it does not happen, but when there is a new chat recorded, I usually get a duplicate. As far as I know this only affects chat objects, not mail objects.
To localize the problem better, I disabled compression and did a series of --chats-only syncs.
- Initial sync: gmvault sync --no-compression --chats-only. 1267 chats stored in subchats-1
- Force an update, i.e. send a chat message (I use a 3rd-party Jabber client, not the native Google app)
- gmvault sync --no-compression --chats-only. This time 1268 chats stored in both subchats-1 and subchats-2
- gmvault sync --no-compression --chats-only. This time no change.
- Force an update
- gmvault sync --no-compression --chats-only. This time 1268 chats stored in subchats-1 and subchats-2, 1269 chats stored in subchats-3
- gmvault sync --no-compression --chats-only (so no update). This time 1268 in subchats-1 and -2, 1269 in -3 and 538 (huh?) in subchats-4
So you see the behavior is not very predictable. Another observation is that the different md5sum of the .gz duplicates is only a side-effect of gzip storing the timestamp of the .eml in the .gz file.
As to the duplicates, after accumulating these four subchats- folders, I discovered they are not always identical: if they are Content-Type: multipart/alternative, then the "boundary" string differs between duplicates. The .meta files are always identical.
I suppose my main question is: what is the logic behind creating new subchats- directories?