Conversation
There was a problem hiding this comment.
What kind of modification would this undergo?
There was a problem hiding this comment.
I don't think we need a modified timestamp here, because any modification timestamp is going to be per-user rather than global.
There was a problem hiding this comment.
Again, I'm curious about what kind of modifications are being tracked here. I realize this is likely due to something the Go/Python version tracks, but it'd be nice to document either inline or near here why this field exists.
There was a problem hiding this comment.
This should be the last-modified timestamp of the collection as a whole, and should change whenever an item is added, modified or deleted in the collection. It will be greater-than-or-equal-to the MAX() of the modified column for that collection in the bso table, with the greater-than case being because an item was deleted.
FWIW, the python server has a bug here in its handling of deleted collections:
mozilla-services/server-syncstorage#62
Basically, it tries to calculate the last-modified time of the storage as a whole by doing SELECT MAX(modified) FROM user_collections WHERE uid = X. That's incorrect in the case of a deleted collection, which should cause the last-modified time of the storage as a whole to increase, but won't affect the last-modified time of any remaining collections.
It's clearly an edge-case, because we haven't bothered to actually fix it in the python version. But for greenfields code it's probably worth doing it right the first time. I suggest adding a deleted_at timestamp field to the user_collections table, so that we can explicit track the deletion of collections. But that can be a follow-up issue if necessary.
src/db/error.rs
Outdated
rfk
left a comment
There was a problem hiding this comment.
This looks like a great start!
Architecturally, I'm a little worried about the way that the DB is taking transactions and querying the current time from behind its API boundary. This could lead to subtle edge-cases under high concurrency, such as two concurrent PUTs inserting items with the same timestamp into the same collection, which could cause the two clients to never sync down each others changes.
Ideally, each HTTP request would be processed in a single logical transaction and at a single logical instant in time. (It might help to think of the modified integers here not as timestamps, but as opaque version numbers, with a new version number being generated for each change to the collection).
I wonder if there's a way to help enforce that at the DB trait API level here.
There was a problem hiding this comment.
The BSO payload can in theory be arbitrary unicode, since it's JSON. Should we set an explicit encoding like utf8mb4 to guard against any weirdness in unicode handling?
There was a problem hiding this comment.
This payload_size is a separate column to (in theory) make it quicker to calculate total size of items stored in a collection. IIRC we don't ever do that in practice, so it may be worth considering whether that optimization still makes sense for us here.
There was a problem hiding this comment.
payload_size does appear to be utilized in the info/quota/collectionion_usage calls
There was a problem hiding this comment.
It might be interesting to compare SELECT SUM(payload_size) vs SELECT SUM(LENGTH(payload)) for this purpose to see whether having it as a separate column really provides much value in practice.
To be clear, I don't have any particular objection to keeping it, just wondering whether the extra complexity pays for itself or not.
There was a problem hiding this comment.
I don't think we need a modified timestamp here, because any modification timestamp is going to be per-user rather than global.
There was a problem hiding this comment.
This should be the last-modified timestamp of the collection as a whole, and should change whenever an item is added, modified or deleted in the collection. It will be greater-than-or-equal-to the MAX() of the modified column for that collection in the bso table, with the greater-than case being because an item was deleted.
FWIW, the python server has a bug here in its handling of deleted collections:
mozilla-services/server-syncstorage#62
Basically, it tries to calculate the last-modified time of the storage as a whole by doing SELECT MAX(modified) FROM user_collections WHERE uid = X. That's incorrect in the case of a deleted collection, which should cause the last-modified time of the storage as a whole to increase, but won't affect the last-modified time of any remaining collections.
It's clearly an edge-case, because we haven't bothered to actually fix it in the python version. But for greenfields code it's probably worth doing it right the first time. I suggest adding a deleted_at timestamp field to the user_collections table, so that we can explicit track the deletion of collections. But that can be a follow-up issue if necessary.
There was a problem hiding this comment.
The "id" here is the batch id, right? It may be worth naming it "batch_id" or similar to avoid confusion.
src/db/mysql/models.rs
Outdated
There was a problem hiding this comment.
It's not obvious to me why some of these are suffixed with _sync and some are not; what's the significance?
There was a problem hiding this comment.
the higher level db interface (trait) supplies async calls. So the MysqlDb impl of this trait will call sync methods via the tokio blocking wrapping call.
src/db/mysql/models.rs
Outdated
src/db/mysql/models.rs
Outdated
There was a problem hiding this comment.
It's not obvious to me whether this defaults sortindex to NULL or to 0; I believe NULL is the correct behaviour.
There was a problem hiding this comment.
good catch, this is leftover from the port from go which instead defaulted to 0.
src/db/mysql/models.rs
Outdated
There was a problem hiding this comment.
The docs aren't clear on what happens when limit < 0, does it default to "no limit"?
src/db/mysql/test.rs
Outdated
There was a problem hiding this comment.
FWIW I wouldn't expect the ttl to be updated unless you explicitly sent a new TTL in the update.
src/db/mysql/models.rs
Outdated
There was a problem hiding this comment.
I assume the // XXX are to maybe do a convert operation with ? so we can ensure we catch casting errors?
6f456b2 to
a8cc797
Compare
|
Will address further things later (likely switching get_bsos limit to u64, table encoding, batches table/architecture is still up in the air). I'm already intending to have the transactions work like you suggest @rfk -- a lot like the python version: one transaction started (TODO: a transaction() call added to the Db trait) per handler request, having all db calls taking place within it. modified timestamp likely following the same pattern |
5722817 to
64d8a48
Compare
w/ some initial calls and a test suite migrated from the sqlite version - prefers raw DQL (note: not DML) queries vs diesel's query builder for potential reuse for other backends (spanner) - TODO: further fleshing out of the types, likely wanting i64 or wrappers everywhere (as all spanner has is INT64) -- nor should the db layer be responsible for conversions from unsigned Issue #18
w/ some initial calls and a test suite migrated from the sqlite version
prefers raw DQL (note: not DML) queries vs diesel's query builder for
potential reuse for other backends (spanner)
TODO: further fleshing out of the types, likely wanting i64 or wrappers
everywhere (as all spanner has is INT64) -- nor should the db layer be
responsible for conversions from unsigned
Issue #18