-
Notifications
You must be signed in to change notification settings - Fork 3
Appendix
The runtime executes transactions on any key-value store that supports get, which
retrieves the values of a set of keys, and cput which conditionally updates a set of
keys if and only if a set of dependent keys remain unchanged.
Internally, the runtime tracks changes to keys using Multiversion Concurrency Control. Each key is associated with a monotonically increasing version number, that is incremented each time that a key is modified. The runtime executes transactions with snapshot isolation, so they are guaranteed to be serializable. Serializable transactions have the same effect when they are executed concurrently as they would if they were executed sequentially.
Implementation of the cput operation is particularly challenging in a distributed database,
because it requires some form of coordination between the various nodes involved in the execution of
a transaction. Distributed databases generally use the following four techniques to implement
conditional put. Each technique has its own advantages and disadvantages. For example, time-based
methods enable strongly consistent snapshots (view of data at a specific point in time, prevents
write-skew anomalies); however, they
require extremely precise clocks or some other form of time synchronization (NTP,
Marzullo's Algorithm, etc.) to mitigate the impact of clock drift across the database cluster.
- Two-Phase Commit (2PC): MySQL, Oracle
- Time: Spanner, Percolator
- Shared Log: Tango, vCorfu
- Paxos: Calvin, Cassandra
A conditional put operation U conflicts with V if either operation depends on a key that the other changes or if they both change the same key. Concurrent application of conflicting put operations can lead to data inconsistencies and race conditions. Given a set of potentially conflicting put operations, correct implementations will find a schedule such that no conflicting operations are ever performed simultaneously and optimal implementations will find a correct schedule with minimal makespan. We show below, by a reduction to graph coloring, that finding an optimal schedule is NP-Hard. Because the optimal schedule problem is intractable, most systems are concerned with solving the much weaker correct schedule problem. Even then, designing an efficient, fault-tolerant, and decentralized solution can be extremely challenging.
Proof: Construct a disjunctive graph G in which vertices represent put operations and undirected edges connect conflicting operations. The schedule of put operations with minimal makespan, can be found by finding an acyclic orientation of G with the minimal longest path. Because all edges of G are initially undirected by construction, finding the acyclic orientation with the minimal longest path is equivalent to finding an optimal graph coloring of G by the Gallai–Hasse–Roy–Vitaver Theorem. Because graph coloring is NP-Complete, finding the schedule of operations with minimal makespan is NP-Hard.
Transactions are executed through repeated partial evaluation, according to the following procedure.
-
Fetch: Call
geton all keys that are read and written by the transaction, and add the returned revisions to a local snapshot. In order to guarantee snapshot isolation,getis only called when a key is read or written for the first time. By batching reads and writes and avoiding duplicate fetches, the runtime is guaranteed to perform a minimal number of roundtrips to and from the database. -
Evaluate: Recursively replace all expressions that have literal operands with their
corresponding literal result. For example,
add(real(1), sub(real(0), real(2)))returnsreal(-1). The result of allwriteexpressions is saved to a local buffer and allreadexpressions return the latest value of the key in either the local buffer or snapshot. - Repeat: Re-evaluate the transaction until it reduces to a single literal value. Because all expressions with literal operands return a literal value, all transactions eventually reduce to a literal.
One of the many benefits of Multiversion Concurrency Control is that it allows the runtime to cache
incoherently without sacrificing data integrity. The runtime may speculate about the value of a key
by reading a potentially stale version from cache, and then validate the cached version on commit.
Multilevel, write-through caching is directly integrated into the runtime. Caustic implements
caching on any key-value store that supports fetch, which retrieves the values of a set of
keys, update, which updates the values of a set of keys, and invalidate which removes a
set of keys.
Cache coherency is maintaining according to the following protocol.
- Keys are updated on cache fetch miss.
- Keys are automatically updated after successful transactions.
- Keys are automatically invalidated after conflicting transactions.
While the implementation of caching is fault tolerant, it is particularly vulnerable to thrashing
under contention. Consider if runtime A and runtime B read a key K from cache.
A updates the value of K, and then subsequently updates the value of K in cache.
When B tries to update the value of K it will cause a conflict, and so it will
subsequently invalidate K in cache. A potential solution might be to remove the invalidation
step from the procedure; however, the implementation would no longer be fault tolerant. Consider if
A updates K but fails to write the new value to cache. Now the value in cache will
remain stale until its evicted, but a key under high contention may never be evicted from cache.