From 15198aa30fa658b0acfe24ca9940804cab408765 Mon Sep 17 00:00:00 2001
From: Gabriel Scherer <gabriel.scherer@gmail.com>
Date: Tue, 18 Jun 2024 16:16:48 +0200
Subject: [PATCH] mixed-atomicity memory model: only relaxed reads

---
 rfcs/mixed-atomicity-memory-model.md | 154 +++++++++++++++++++++++++++
 1 file changed, 154 insertions(+)
 create mode 100644 rfcs/mixed-atomicity-memory-model.md

diff --git a/rfcs/mixed-atomicity-memory-model.md b/rfcs/mixed-atomicity-memory-model.md
new file mode 100644
index 0000000..5869e2d
--- /dev/null
+++ b/rfcs/mixed-atomicity-memory-model.md
@@ -0,0 +1,154 @@
+# Adding relaxed reads to the OCaml memory model
+
+The current OCaml memory model is defined in 
+
+> Bounding Data Races in Space and Time  
+> Stephen Dolan, KC Sivaramakrishnan, Anil Madhavapeddy, 2018  
+> https://kcsrk.info/papers/pldi18-memory.pdf
+
+The model is described by an operational semantics (with non-deterministic memory operations) and an equivalent axiomatic semantics.
+
+The model distinguishes atomic locations and non-atomic locations. Read/write operations on atomic locations are atomic, they induce a synchronization between a reader and the corresponding writer. Read/write operations on non-atomic locations are non-atomic, they induce no synchronization. This separation corresponds to the current OCaml programming model, where `'a Atomic.t` values are atomic locations and all other mutable data is non-atomic.
+
+@polytypic expressed interest in adding "fenceless" operations to the `Atomic.t` type, and his Multicore-magic library exports `fenceless_{get,set}` operations that are just non-atomic reads and writes. There are many uses of `fenceless_get` in particular in the kcas library; a common pattern is to use a `fenceless_get` in read-compute-cas retry loops: we don't need an atomic read as the following compare-and-set will synchronize with past writers anyway, and reading a stale value does not endanger correctness but just provoke a retry.
+
+One question that came up in discussions with @OlivierNicole, @fabbing, @fpottier and @clef-men is whether we know how to describe the resulting model, where both atomic and non-atomic operations may be performed on a given location.
+
+In #41 I proposed a general mixed-atomicity memory model that supports both relaxed reads and writes on atomic locations, in essence erasing the distinction between atomic and non-atomic locations, based on discussions with @clef-men. @stedolan [explained](https://github.com/ocaml/RFCs/pull/41#issuecomment-2178274266) that relaxed writes are a dangerous addition to the model, as they break desirable properties and cannot be implemented with just the current OCaml compilation scheme.
+
+The current RFC proposes a specification for a simpler extension where only relaxed *reads* are added, but not relaxed writes: only non-atomic accesses are used on non-atomic locations, and atomic locations allow either atomic access or a non-atomic read.
+
+(Warning: we are not memory-model experts, so we are just making stuff up.
+cc @stedolan, @kayceesrk, @avsm, the authors of the initial model,
+and @gmevel, @jhjourdan and @fpottier as the authors of the Cosmo separation logic)
+
+## A detailed reminder on the current operational memory model
+
+(This section is unchanged from the previous RFC.)
+
+In the current operational semantics for the memory-model, there are separate categories for atomic variables `A` and non-atomic variables `a`. The global store `S` stores a value `x` for each atomic location `A`, but for non-atomic locations `a` it stores a *history* `H`, a map from locations to values such that `H(t) = x` when a write `a := x` happened at timestamp `t`.
+
+A given thread has observed some writes to each non-atomic location `a`, but not necessarily all writes. When the thread reads from `a`, it reads non-determinstically the last-write value that it has observed, or any value more recent than that (written at a later timestamp). We call a *frontier* `F` a mapping from locations to timestamps that represents a time-of-last-observed-write for each location. Each thread has a frontier.
+
+Finally, the global store maps each atomic location `A` to a value `x`, but also to a fronter `Fᴬ`, that represents information propagated by threads that have written into this location. When a thread reads from a location `A`, it updates its own frontier `F` with the frontier of the location `Fᴬ` -- its frontier becomes the maximum of the timestamps, `F ⊔ Fᴬ`. When a thread *writes* to an atomic location, it updates its own frontier to become `F ⊔ Fᴬ`, but it also updates the atomic location's frontier in the same way.
+
+In LaTeX-turned-Unicode, this gives:
+
+```
+location              l
+atomic location       A
+non-atomic location   a
+value                 x
+timestamp             t ∈ Q (rational number)
+
+Machine     M := <S, P>
+Store       S := a ↦ H ⊎ A ↦ (F, x)
+History     H := t ↦ x
+Frontier    F := a ↦ t
+Program     P := i ↦ (F, e)
+transition  ϕ := read x | write x
+
+e ~<l:ϕ>→ e'    S(l); F —<l:ϕ>→ C'; F'
+———————————————————————————————————————————————— Step
+<S, P || (F, e)> —→ <S[l ↦ C'], P || (F', e')>
+```
+
+This first judgment above is a machine-reduction rule, it is the rule for memory operation. (There is another rule for silent reductions which I omitted, see the paper.) If a thread has a frontier `F` and its code is `e`, and `e` steps into `e'` by performing a memory operation `ϕ`, the global store is transformed by the same memory operation `ϕ`. The update of the global store captures the memory model.
+
+It is described by the judgment `S(l); F -<l:ϕ>→ C'; F'`. This judgment takes the current data `S(l)` for the location `l` in the store, and the frontier `F` of the current thread, and returns new data `C'` to put in the store and an updated frontier `F'` for the thread.  `S(l)` and `C'` are elements in the store, that is either a history `H` for non-atomic locations `a` or a frontier-and-value pair `(F, x)` for atomic locations `A`.
+
+```
+if F(a) ≤ t, t ∈ dom(H)
+———————————————————————————— Read-NA
+H; F  ——<a:read H(t)>—→ H; F
+```
+
+Read-NA is the rule for a non-atomic read on `a`, which may non-deterministically read any value in the history `H` of `a` that is at least as recent as `F(a)`, the last write on `a` observed by the current thread.
+
+
+```
+if F(a) < t, t ∉ dom(H)
+—————————————————————————————————————————— Write-NA
+H; F  ——<a:write x>—→ H[t ↦ x]; F[a ↦ t]
+```
+
+Write-NA on a non-atomic location `a` creates a new timestamp that is fresh, and more recent than the time-of-last-observed-write `F(a)` – but there may still be more recent timestamps in the history. The frontier of the current thread is updated, this new timestamp `t` becomes the time of the last observed write.
+
+```
+————————————————————————————————————————— Read-AT
+(Fᴬ, x); F ——<A:read x>—→ (Fᴬ, x); Fᴬ ⊔ F
+```
+
+A read on an atomic location `A` returns the value read from the location (and does not change the frontier `Fᴬ` stored in the store), but it updates the frontier of the current thread to become `Fᴬ ⊔ F`: for each location `a`, the new time-of-last-known-write is the maximum between the previous observed time `F(a)` and the time `Fᴬ(a)` known to past writers of `A`.
+
+```
+—————————————————————————————————————————————— Write-AT
+(Fᴬ, y); F ——<A:write x>—→ (Fᴬ ⊔ F, x); Fᴬ ⊔ F
+```
+
+A write on an atomic location `A` updates the value in the store, *and* the frontier in the store, *and* the frontier of the current thread. This is where synchronization happens.
+
+One higher-level way to think of this is that, for an atomic location `A`, the store tracks the maximum frontier of all threads that wrote to `A`. Reading from an atomic location transfers this synchronization information from all previous writers.
+
+## New proposed version
+
+We propose to keep all definitions and rules in the current model unchanged, in particular the rule for atomic reads which we recall for reference:
+
+```
+————————————————————————————————————————— Read-AT
+(Fᴬ, x); F ——<A:read x>—→ (Fᴬ, x); Fᴬ ⊔ F
+```
+
+
+and to add a new rule for relaxed/non-atomic reads to atomic locations.
+
+```
+——————————————————————————————————————— Read-relaxed
+(Fˡ, x); F ——<A:read-at x>—→ (Fˡ, x); F
+```
+
+The only difference with atomic reads is that the thread frontier is not updated. Such a relaxed read gets the "current" value of the store (as defined by the sequential ordering between atomic writes), but without any synchronization with the atomic writers.
+
+## Implementation
+
+We *assume* that the current compilation strategy is enough to realize this memory model, where relaxed reads are compiled exactly like non-atomic reads.
+
+## Reasoning about the model
+
+### Reasoning about programs
+
+One easy way to reason about relaxed reads is when there is an atomic access on the same location right before or right after the relaxed read:
+
+- If there is an atomic access after the relaxed read, the thread frontier gets updated at this point,
+  and then the execution behaves as if the read had been atomic.
+
+- If there is an atomic access before the relaxed read, then the thread frontier has already been updated,
+  and some executions of the version where the read is atomic would have resulted in the same frontier
+  after the read.
+
+This is an example from `kcas` with a relaxed read after an atomic access:
+
+https://github.com/ocaml-multicore/kcas/blob/240981e0ef9e9d5de6801aa8a15b62f73b7a37af/src/kcas/kcas.ml#L307-L312
+
+This is an example from `saturn_benchmarks` with a relaxed read before an atomic access:
+
+https://github.com/lyrm/saturn_benchmarks/blob/23755624cb35eabb8cd341008ebcb8582cc24d82/datastructures/two_stack_queue/two_stack_queue.ml#L83-L89
+
+(reasoning about this requires checking that `push` always eventually performs an atomic access on the same location)
+
+We also found plenty of examples in the wild where `fenceless_get` is used in more complex situations, for example cases where the atomic update only happens conditionally, based on the result of the relaxed read. Those will require more complex reasoning.
+
+### Meta-theory
+
+Our non-expert intuition is that there are no races between atomic writes and relaxed reads on the same location: either the read reads-from the write, and it comes after, or it reads-from an older write and it comes before.
+
+In particular, unlike the richer proposal in #41, there is no new notion of race / conflict. If this intuition holds, it should be easier to establish that the "Local DRF" theorem of the memory-model paper is preserved.
+
+## Remaining questions
+
+1. Is there a nice axiomatic formulation of this model?
+
+2. Are people reassured by this description enough to consider adding `Atomic.get_relaxed` to OCaml?
+
+3. Do we indeed still get a "Local DRF" theorem in this setting?
+