From fe59b3922970edf647cd5967b872a52d669d78e9 Mon Sep 17 00:00:00 2001 From: Richard Eisenberg Date: Thu, 27 Oct 2022 15:47:05 -0400 Subject: [PATCH 1/2] Add unboxed types proposal --- rfcs/unboxed-types.md | 1989 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1989 insertions(+) create mode 100644 rfcs/unboxed-types.md diff --git a/rfcs/unboxed-types.md b/rfcs/unboxed-types.md new file mode 100644 index 0000000..57839af --- /dev/null +++ b/rfcs/unboxed-types.md @@ -0,0 +1,1989 @@ +# Unboxed types in OCaml + +## Motivation + +Suppose you define this type in OCaml: + +```ocaml +type t = { x : Int32.t; y : Int32.t } +``` + +Values of this type carry 8 bytes of actual data. However, in OCaml, +values of this type are represented as: + + - an 8-byte pointer + + - to a 24-byte block + (8B header + 2x 8B fields) + + - pointing at two more 24-byte blocks + (8B header + 8B `caml_int32_ops` + 4B data + 4B padding) + +There are some efficiency gains to be had here. + +## Principles + +In this design, we are guided by the following high-level principles. +These principles are not incontrovertible, but any design that fails +to uphold these principles should be considered carefully. + +**Backward compatibility**: The extension does not change the +correctness or performance of existing programs. + +**Availability**: Unboxed types should be easy to use. That is, +it should not take much change to code to use unboxed types and +values. + +## Unboxed primitives + +This is a proposed extension to OCaml to allow the layout of data +types to be controlled by the programmer. This proposal won't affect +the type defined above, but will allow you to define better layouts. +With unboxed types, we can write the following: + +```ocaml +type t = { x : #int32; y : #int32 } +``` + +Here, the `x` and `y` fields have the unboxed type `#int32`, which +consumes only 4 bytes. Values of type `t` are still boxed: they are +represented as 8-byte pointers to a 16-byte block, consisting of an +8-byte header and two 4-byte fields. Despite containing unboxed +fields, the type t is an ordinary OCaml type: it can be passed to +normal functions, stored in normal data structures, and so on. + +On the other hand, the new type `#int32` is unboxed. This means that +it can't be used to instantiate ordinary type parameters. For +example, `fun (x : #int32) -> Fn.id x` will not work: `Fn.id` doesn't +work on types that are not represented by exactly one word. Similarly, +you can't have a `#int32 list`, because a list expects its contents to +be represented by exactly one word. Getting this right is important +for, e.g., the garbage collector, which expects values in memory to +have a certain format. + +Unboxed types, thus, require a trade-off: they are more compact, +but do not work as smoothly as normal boxed types. + +The following types will be available in the initial environment, given with +example literals of that type: + +```ocaml +#() : #unit +#42b : #int8 +#42s : #int16 +#42l : #int32 +#42L : #int64 +#42n : #nativeint +#42. : #float +``` + +## Layouts + +So, unboxed types mean that polymorphic definitions need to be +polymorphic over only a subset of the possible types (say, the boxed +ones). We do this by introducing *layouts*, which classify types in +the way that types classify values. (The technical term for such +things is *kinds*, but we're avoiding that word because (a) it's +unfamiliar to programmers who aren't type theorists and (b) it doesn't +describe what we're using kinds for.) The layout of a type determines +how many bytes of memory values of that type consume, how many and +what kind of registers are needed to pass such values to functions, +and whether locations holding such values need to be registered with +the GC. + +Once we have layouts, the type of, say, `diag : 'a -> 'a * 'a` is a short form +of: + +```ocaml +val diag : ('a : value) . 'a -> 'a * 'a +``` + +The layout `value` is the layout of ordinary OCaml values (boxed or +immediate). This means that trying to use diag at type `#int32 -> +#int32 * #int32` causes a compilation error, as `#int32` is not of +layout `value`. + +The following layouts are available, given with example types of that layout: + +```ocaml +(* concrete ones *) +string : value +int : immediate +#unit : void +#int8 : bits8 +#int16 : bits16 +#int32 : bits32 +#int64 : bits64 +#nativeint : word +#float : float64 + +(* indeterminate ones *) +any +gc_friendly +gc_ignorable +``` + +(On 64-bit machines, `word` has the same representation as `bits64`, but +we don't want to bake this coincidence into the type system.) +As well as the above primitive layouts, layouts may be combined using +the `*` and `+` operators. Unlike in types, `*` and `+` are associative +in layouts. That is, the +following are all equivalent at runtime: + +``` +(value * value) * value +value * (value * value) +value * value * value +``` + +There is a (transitive) subtyping relation between these layouts, as pictured +here: + +``` + ----any----- + / \ + gc_friendly gc_ignorable------------- + | \ / | \ \ + value void word float64 bits_8..64 + | + immediate +``` + +(It would be safe to add `immediate <= word`, as well, but we don't for now. See +an Extension toward the end of this document.) The subtyping relationship +between product and sum layouts is pointwise, and only between products/sums of +the same length. A product/sum is a subtype of `gc_friendly` +(resp. `gc_ignorable`) iff its components are subtypes `gc_friendly` +(resp. `gc_ignorable`). All products/sums are subtypes of `any`. + +A `gc_friendly` layout is one that the garbage collector knows how to treat: +these are pointers to gc-managed memory and tagged immediates. A `gc_ignorable` +layout is one that the garbage collector is free to ignore: these values are not +pointers to gc-managed memory (and contain no such pointers). + +We might imagine that we could make, e.g. `(value * value) * value` +a subtype of `value * (value * value)` (and vice versa), but this +would play poorly with type inference. For example, if we had +`'l1 * 'l2 = value * value * value` (where `'l1` and `'l2` are unification +variables), how could we know how to proceed? Perhaps we could make +a subtyping relationship here, but only when there is an explicit check; +we leave this off the design for now. + +Names used in layouts (both the alphanumeric names listed above and the infix +binary operators) live in a new layout namespace (distinct from all existing +namespaces) and are in scope in the initial environment. Though this proposal +does not introduce syntax to do so, we can imagine constructs defining new names +in the layout namespace that could shadow these existing names. + +## Unboxed types and records with unboxed fields + +All of the currently-existing types in OCaml have layout `value`. Some +also have layout `immediate`, such as `int` or `unit`. We're planning +to add unboxed versions of a few builtin types, such as `#int32` and +`#float`. This gives: + +```ocaml +type float : value +type int32 : value +type int : immediate +type #float : float64 +type #int32 : bits32 +``` + +All of these can be passed and returned from functions, but only the +`value` types (including `immediate` ones) can be stored in, say, +lists. Note that the existing `+.`, etc. primitives won't work on +`#float`, because they have the wrong type. New ones for `#float` will +be made available. (Same for `#int32`, etc.) + +Unboxed types will be allowed in record fields: + +``` +type t = { + foo : int; + bar : #float; + ... +} +``` + +The OCaml runtime forbids mixing safe-for-GC fields and unsafe-for-GC +fields in a single block. See the mixed-block restriction, below. + +## Restrictions around known representations + +A key aspect of the design of unboxed types is that we must know +the representation of a type before we can ever manipulate a value +of that type. For example, check out `wrong` here: + +```ocaml +let wrong : ('a : any). 'a -> 'a = fun x -> x +``` + +This function is impossible to define; it would get rejected at +compile time. The reason is that the compiled `wrong` function must +take an argument somehow and then return it somehow. Yet, if we don't +know the representation of that argument, we cannot do this! Does the +argument fit in one register or two? Is it a pointer (that the GC would +have to be aware of) or not? We cannot know: the argument's layout is +`any`. + +Now consider this function: + +```ocaml +let strange : ('a : any). string -> 'a = fun msg -> raise (Msg msg) +``` + +This one is OK. This function never has to manipulate the value +of type `'a`: that value is produced by a tail call, and so the code for +`strange` never has to move it. (`raise` is magical, allowed to compile +by virtue of the fact that it doesn't actually return to its caller.) In +practice, a function like `strange` would always be called with a known +layout (such as `value` or `bits32 * bits32`), and so the caller would know +where and how to expect the result. The key aspect of this, though, is that +we can compile `strange` to executable code without knowing the representation +of `'a`. + +Here's one last example: + +```ocaml +let rec silly : ('a : any). unit -> 'a -> 'a = + fun () -> silly () +``` + +(The `unit` argument is just to prevent +the definition from falling into a loop right away.) This definition +is also accepted. Note that, even though the type of `silly` has a +type of layout `any` to the left of an arrow, it does not +manipulate a value of that type. Thus, this is fine. + +Abstracting from these examples, there are precisely two places we +require a concrete layout -- no `any` here. + +1. When binding a term variable of type `t`, `t` must have a concrete +layout. +2. When passing a term of type `t` to a function, `t` must have a +concrete layout. + +There is no restriction on returning a value from a function. +See "Kinds are calling conventions" for more exploration of these +restrictions. + +There is also no restriction on knowing the representation of type variables in +type declarations. For example, the following declaration is fine: + +```ocaml +type ('a : any) pair = { first : 'a; second : 'a } +``` + +The above restrictions imply that any construction of `pair` or pattern-match on +`pair` would work only for instantiations that have known representations, but +we can suffice with just one, flexible type declaration. + +## Adding a box + +Because of the restrictions around the usage of unboxed types +(both that they cannot work with standard polymorphism and the +restriction just above about mixing representations in record +types), we sometimes want to box an unboxed type. The types +we have seen so far have obvious boxed cousins, but we will soon +see the ability to make custom unboxed types. We thus introduce + +``` +type ('a : any) box + +val box : ('a : any). 'a -> 'a box +val unbox : ('a : any). 'a box -> 'a +``` + +These definitions are magical in a number of ways: + +* It is best to think of the type `box` as describing an infinite +family of types, where the member of the family is chosen +by the layout of the boxed type. (The family is infinite because of +the `*` and `+` layouts, which can be used to create new layouts.) + +* It would be impossible to define the `box` function without magic, because +it runs afoul of the restriction around function arguments. +A consequence of this design is that the `box` function cannot +be abstracted over: a definition like `let mybox (x : ('a : any)) = box x` +runs into trouble, because it binds a variable of a type whose layout +is unknown. + +* Because the function `box` is really a family of functions, keyed +by the layout of its argument, it is not allowed to use `box` without +choosing a layout. The simplest way to explain this is to say that `box` +must always appear fully applied. In this way, it might be helpful to +think of `box` more as a keyword than a function. But we could, in theory, +allow something like `let box_float : ('a : float64). 'a -> 'a box = box`, +where `box` is unapplied (but specialized). However, there is little +advantage to doing so, and it's harder to explain to users, so let's +not: just say `box` must always appear applied. (Note: GHC has a similar +restriction around similar constructs.) + +* There is additional magic around various forms of unboxed types, +as described below. + +## Arrays + +Going beyond storing just one unboxed value in a memory box, we also +want to be able to store many unboxed values in an array. Accordingly, +we extend the existing `array` to take a type argument of `any`, +as if `array` were declared with + +```ocaml +type ('a : any) array +``` + +Just like we need magical value-level primitives `box` and `unbox` to deal +with `box`, we will also need similar primitives to deal with arrays +of unboxed values. This proposal does not spell out the entire API for this +feature, but we will work it out during the implementation. Regardless +of the API details, the `array` type must have a similar layout-indexed +magic as `box`, though `array` could conceivably use a different memory +layout as `box` does. In particular, the memory format of a boxed variant +(as described below in the section on "Unboxed variants") has a variable length, +making it impossible to pack into an `array`. Thus `array` may choose to use +a different memory format as `box` in order to allow for indexing. + +Note that extraction of an element from an array of unboxed values (e.g. +with `get`) requires +copying the element. There are two ways a user might want to get an +unboxed value, via the following hypothetical API: + + get : 'a array -> int -> 'a + get_boxed : 'a array -> int -> 'a box + +If the type `'a` is unboxed, the `get` function copies the value from +the memory allocated as part of the array into registers (or other +destination). To avoid this copying, one might imagine `get_boxed`, intended +to return a pointer to the existing allocated memory within the array. +However, `get_boxed` cannot do this: an `'a box` is expected to be a pointer +to a block with a header, and an element within an array lacks this header. +Instead, we might imagine + +```ocaml +get_element : 'a array -> int -> 'a element +``` + +where `element` is a new type that represented a 1-element slice of an +allocated array. The `get_element` function would indeed just return +a pointer, but it requires yet another magical type `element`, represented +by a pointer to the middle of a block. (This would pose a challenge +for garbage collection, though presumably not an insurmountable one.) + +We do not explore this aspect of the design further in this document. +Instead, this consideration of `get_boxed` and `get_element` serves to +explain why there is no loss of run-time efficiency if the memory format used +by `box` and the memory format used by `array` are completely different. + +## Unboxed tuples + +In addition to new primitive types, like `#int32`, this proposal also +includes user-defined unboxed +types (by contrast to what's above: user-defined boxed types +containing unboxed fields). The simplest of these is unboxed tuples: + +```ocaml +#( int * float * string ) +``` + +This is an unboxed tuple, and is passed around as three separate +values. Passing it to a function requires three registers. +The layout of an unboxed tuple is the product of the layout of its +components. In particular, the layout of `#( string * string )` is +`value * value`, not `value`, and so it cannot be stored in, say, a +list. + +Constructing and pattern-matching against an unboxed tuple requires +a `#`: `let ubx : #(int * #float) = #(5, #4.)`. Note that parentheses +are required for construction and pattern-matching unboxed tuples. + +Naturally, boxing an unboxed tuple yields a boxed tuple. We thus add +the following rules: + +* For all unboxed tuples `#(ty1 * ty2 * ...)`, +`#(ty1 * ty2 * ...) box = (ty1 * ty2 * ...)`. The `...` is meant to +denote that this works for any arity, including 0. + +* The syntax `(e1, e2, ...)` continues to work for boxed tuples. + +### The mixed-block restriction + +The current OCaml runtime has the following restriction: + +* For every block of memory, one of the following must hold: + + 1. All words in the block (other than the header) may be inspected + by the garbage collector. That is, every word in the block is either + a pointer to GC-managed memory or has its bottom bit tagged. This is a + gc-friendly block. + + 2. No words in the block are pointers to GC-managed memory. This is a + gc-ignorable block. + +This is also described at in [Real World OCaml](https://dev.realworldocaml.org/runtime-memory-layout.html). + +We call this rule the *mixed-block restriction*, and we call types +whose memory layout is mixed as *mixed-block types*. We imagine that +we will be able to lift this restriction in a future release of the +runtime, but we do not expect to do this as part of our initial rollout +of unboxed types. We thus refer to this restriction in several places +in this specification so that readers might imagine a future improvement. + +Note that a mixed-block type has a layout that is neither a subtype of +`gc_friendly` nor a subtype of `gc_ignorable`. + +The first consequence of the mixed-block restriction is already apparent: +we cannot store, say, an unboxed tuple `#(string * #float)` in a memory +block, as that block would be mixed. + +We thus have the following restriction on mixed-block types: `box` may +never be called on a mixed-block type. Equivalently, the layout of the argument +to `box` must be either a subtype of `gc_friendly` or `gc_ignorable`. +(This is indeed the essense of +the mixed-block restriction.) However, because `box` is implicitly +used by constructors of boxed types -- such as a boxed tuple -- this +restriction applies more widely than code literally calling `box`. For +example, the following would be rejected: + +```ocaml +let bad : (string * #float) = "hi", #4. +``` + +The problem is the implicit call to `box` in the definition, which would +expand to + +```ocaml +box #("hi", #4.) +``` + +Note that the mixed-block restriction never requires looking directly at +*types*; instead, a mixed-block type can be identified by looking only at +its *layout*. This is in keeping with our understanding of `box` as a +family of functions, indexed by layout. The members of the family corresponding +to mixed-block types simply do not exist. + +## Unboxed records + +Unboxed records are in theory just as straightforward as unboxed tuples, but +have some extra complexity just to handle the various ways that types are +defined and used. Suppose you write: + +```ocaml +type t = { x : #int32; y : #int32 } +``` + +Sometimes we want to refer to `t` as a boxed type, to store it in +lists and other standard data structures. Sometimes we want to refer +to `t` as an unboxed type, to avoid extra allocations (in particular, +to inline it into other records). The language never infers which is +which (boxing and allocation is always explicit in this design), so we +need different syntax for both. + +To make this all work, we consider a declaration like the one for +`t` above to actually declare *two* types. To wit, we desugar the +above syntax to these declarations: + +```ocaml +type #t = #{ x : #int32; y : #int32 } +type t = #t box +``` + +That is, we have a primitive notion of an unboxed record, called +`#t`, and then we define `t` just as `#t box`. (The `#{` syntax is +also available to users, if they want to just directly declare an +unboxed record.) + +To make this backward compatible (and at all pleasant to program in), +we add a little more magic to `box`: + +* In the syntax `e.x` (where `e` is an arbitrary expression and `x` +is a path), if `e` has type `ty box` (for some `ty`), then we treat +the expression as if it were projecting from `ty` after unboxing. + +* Along similar lines, we allow record-creation and pattern-matching +syntax `{ ... }` to create and match against `box`. + +We might imagine an alternative design without automatically creating +unboxed record definitions in this way, but that design does not uphold +the **Availability** principle as well as the current design. + +### Construction + +We use a `#` in the syntax for creating unboxed records: + +```ocaml +type q = #{ x : int; y : #int32 } +let g : q -> q = fun #{ x; y } as r -> #{ x = x + r.x; y } +``` + +(If we didn't have the special treatment for `box`, we wouldn't need to +differentiate construction syntax, as we can infer that the record is unboxed +from the field labels, as we do for other record types. We could also +release a first version that requires `#{` and then see if users want to +be able to avoid the `#`. If we drop the `#` requirement, we could simply +default to boxed, unless the type system disambiguation mechanism specifically +selects for unboxed.) + +Type aliases do not automatically bind an unboxed version; rather +it is a property of the record type declaration that creates the +extra unboxed definition. However, if you create a transparent alias +like + +```ocaml +type u = t = { x : int32; y : int32 } +``` + +then we translate this to become + +```ocaml +type #u = #t = #{ x : int32; y : int32 } +type u = #u box (* equivalent to type u = #t box *) +``` + +Naturally, you can also directly alias an unboxed type: + +```ocaml +type u = #t +``` + +Similarly, an abstract type abstracts only the boxed type: + +```ocaml +module M : sig + type t +end = struct + type t = { x : int32; y : int32 } +end +``` + +defines `M.t` but not `M.#t`. If you want to export both +an unboxed and boxed version of an abstract type, you can with + +```ocaml +module M : sig + type unboxed_t + type t = box unboxed_t +end = struct + type t = { x : int32; y : int32 } + type unboxed_t = #t +end +``` + +We might imagine allowing module signatures to include `type #t` +when they also include `type t`, but as we see here, this feature +is not strictly necessary. + +### Field names and disambiguation + +If we have a + +```ocaml +type t = { x : int; y : int } +``` + +and write a function + +```ocaml +let f t = t.x + t.y +``` + +what type do we infer for `f`? Of course, we want to infer `t -> int`, but in +the presence of unboxed types, we might imagine inferring `#t -> int`, as `t.x` +is a valid way of projecting out the `x` field of both `t` and `#t`. Yet +inferring `#t -> int` would be terribly non-backward-compatible. + +We thus effectively have the boxed projection shadow the unboxed one. That is, +when we spot `t.x` (for a `t` of as-yet-unknown type), we assume that the +projection is from the boxed type `t`, not the unboxed type `#t`. If a user +wants to project from the unboxed type, they can disambiguate using a prefix +`#`, like `t.#x`. Thus, if we have + +```ocaml +let g t = t.#x + t.y +``` + +we infer `g : #t -> int`, though inference is non-principal. Users could also +naturally write + +```ocaml +let g t = t.#x + t.#y +``` + +which supports principled type inference. + +To be clear, the `#` mark is just used for disambiguation. The following also +works: + +```ocaml +let g (t : #t) = t.x + t.y +``` + +There is no ambiguity here, and thus the `#` mark is not needed on record +projections. + +Although this section discusses only record projections, the same idea applies +to record construction and pattern matches: the field of the boxed record +shadows the field of the unboxed record, though the latter can be written with a +`#` prefix (or can be discovered by type-directed disambiguation). + +### Mutation + +There are several concerns that arise when thinking about mutable fields and +unboxed types; this section lays out the scenarios and how they are treated. +We assume the following declarations: + +```ocaml +type point = { x : #int32; y : #int32 } +type mut_point = { mutable x : #int32; mutable y : #int32; identity : int } +``` + +1. **Mutable unboxed field in a boxed record**. Example: + + ```ocaml + type t = { radius : #int32; mutable center : #point } + ``` + + A mutable unboxed field can be updated with `<-`, analogously to a mutable + boxed field. However, updating an unboxed field might take multiple separate + writes to memory. Accordingly, there is the possibility of data races when + doing so: one thread might be writing to a field word-by-word while another + thread is reading it, and the reading thread might witness an inconsistent + state. (For records with existential variables and unboxed sums, this + inconsistent state might lead to a segmentation fault; for other types, the + problem might arise only as an unexpected result.) We call this undesirable + action *tearing*. + + To help the programmer avoid tears, an update of a mutable + unboxed field is sometimes disallowed. We surely want to disallow such an + update if it is not type safe. We also want to disallow such an update if it + could violate abstraction: perhaps some abstract record type internally + maintains an invariant, and a torn record might not support the invariant. + + Some cases are easy: we definitely want to support mutation of `#float`s, and + we definitely want to disallow mutation of wide unboxed variants (where we + might update the tag and the fields in separate writes). What about wide + unboxed records? It depends. Consider the type `t` in this section, and + imagine `type s = { mutable circle : #t }`. Should we allow + `s.circle <- ...`? Our answer, in essence: if a type is concrete, then allow + the update. After all, a concrete type cannot be maintaining any invariants + (or if it is, the programmer is responsible). So, if we have the definition + of `#t`, then `s.circle <- ...` is allowed. However, if we do not have the + definition of `#t`, then `s.circle <- ...` is disallowed. + + Yet this design -- based on the concreteness of a type -- is disappointing, + as it bases the choice of allowing mutable updates on whether or not an + interface exposes a representation. Instead, we would want to be able to + write an abstract type yet which allows a potentially-tearing update. + + We thus introduce a new layout `*w` (the "w" stands for "writable"). (We + will use `*w` in our notation here, but the surface syntax is different and + defined below.) `*w` is like `*` (and `l1 *w l2 <= l1' * l2'` iff `l1 <= + l1'` and `l2 <= l2'`), but a product built with `*w` is allowed in a mutable + update, even if it is potentially non-atomic. The inferred layout of a most + (see below) product types (like `#t`) will be built with `*w`, not `*`. Yet + if a type is exported abstractly from a module, the module signature will + have to specify the layout of that type; most users will write a layout + using `*`, thus protecting values of that type from getting torn. + + If a user wants to write a layout with `*w`, they can do so like this: + + ```ocaml + type t : bits32 * (bits32 * bits32) [@writable] + ``` + + The `[@writable]` attribute transforms all `*`s in a layout to be `*w`s. + + Despite most product types being inferred to have a `*w` layout, product + types with existential variables are inferred to have a `*` layout, as + tearing an existential could be type-unsafe. Here is an example: + + ```ocaml + type ex_int = K : 'a * ('a -> int) -> ex_int + ``` + + The layout of `#ex_int` is `value * value` (the first components is `value` + because there is no annotation on `'a` suggesting otherwise), with the + non-writable `*`. + + Putting this all together, we define the following predicate to test for + writeability: + + ``` + writable : concrete_layout -> bool + writable(l1 * l2) = false + writable(l1 + l2) = false + writable(l1 *w l2) = writable(l1) && writable(l2) + writable(_) = true + ``` + + The update of a mutable field is thus allowed if *either*: + + 1. The layout of the field fits in a 64-bit word, *or* + 2. The layout of the field satisfies the `writable` predicate above. + + The first possibility -- for word-sized layouts -- is because a type whose + values fit in a 64-bit word have no threat of getting torn, and thus + can be updated safely. We choose *64-bit* words (not 32-bit ones) because + 32-bit OCaml does not support multicore, and so no possibility of tearing + exists. + + The restriction described here is called the *tearing restriction*. + + The tearing restriction allows updates of our `mutable center` field, + for two reasons: the type `#point` is concrete and thus has layout + `bits32 *w bits32`, with `*w`, and a `#point` fits in a 64-bit word. + + A user who wishes to update a non-atomic mutable field (with their own plan + for synchronization between threads) may do so with the `[@non_atomic]` + attribute, like so: + + ```ocaml + type point = { x : #int32; y : #int32 } + module Circle : sig + type t : bits32 * (bits32 * bits32) + end = struct + type t = #{ radius : #int32; mutable center : #point } + end + + type t = { mutable circle : #Circle.t; color : Color.t } + let f (t : t) = t.circle <- #{ ... } [@non_atomic] + ``` + + The design in this section upholds the following principle: + + * Even in the presence of data races, a programmer can never observe an + intermediate state of a single assignment (unless they write `[@non_atomic]`). + + **Optional.** It may be convenient for users to get a warning when they + declare a mutable field that falls afoul of the tearing restriction and + thus requires `[@non_atomic]` on *every* update. + +2. **Mutable field in an unboxed record**. Example: + + ```ocaml + fun (pt : #mut_point) -> (pt.x <- pt.x + #1l); pt + ``` + + Mutation within an unboxed record would be unusual, because unboxed values have no + real notion of identity. That is, an update-in-place appears to be + indistinguishable from a functional update (that is, `{pt with x = pt.x + + #1l}`). By its unboxed nature, an unboxed value cannot be aliased, and so the + mutation will not propagate. + + Here is an example: + + ```ocaml + fun (pt : #mut_point) -> + let boxed = box pt in + pt.x <- pt.x + #1l; + (unbox boxed).x = pt.x + ``` + + This function would always return `false`, because the point stored in the + box is a *copy* of `pt`, not a reference to `pt`. In other words, unboxed + values are passed by *value*, not by *reference*. Accordingly, every time an + unboxed value is passed to or returned from a function, it is copied. + + Because of the potential for bugs around mutability within unboxed records, + this proposal disallows it: in an unboxed record, any `mutable` keywords are + ignored. Accordingly, the example above is rejected: `pt.x <- ...` is + disallowed, because `pt` is an unboxed record. + + This section can be summarized easily: + + * Fields of unboxed records are never mutable. + +3. **Fields within a mutable unboxed record within a boxed record.** Example: + if we have `t : t` (where the type `t` is from mutation case 1, above), then + `t.center.x` is a field within a mutable unboxed record within a boxed + record. + + The case of such a field is interesting in that an unboxed record placed + within a boxed record *does* have a notion of identity: it lives at a + particular place in the heap, and the surrounding boxed record might indeed + be aliased. Thus mutation in this nested case does make good sense, and we + surely want to support it. + + We thus introduce a new syntax for updates of individual fields of a + mutable unboxed record: We write e.g. `t.(.center.x) <- ...`. Note the + parentheses: they denote that we're updating the `x` sub-field within the + mutable `center`. The parentheses are required here, as is the leading + `.`. (The leading `.` within the parentheses disambiguates the syntax with + array access.) + + The rule is this: Consider the list of field accesses in a parenthesized + field assignment. The first field must be mutable, and all fields in the + list must be unboxed. + + By the first field being mutable, we know that the record directly before + the open-parenthesis is boxed, because unboxed records do not have mutable + fields. + + This allows `t.(.center.x) <- #3l` even though `x` is not + declared as `mutable`. This does not break abstraction: the type of `center` + must already be concrete in order to write this, and so a user can always + write `t.center <- {t.center with x = #3l}` instead (modulo the + tearing restriction). In effect, `t.(.center.x) <- #3l` can be seen as + syntactic sugar for `t.center <- #{ t.center with x = #3l }`, but allowed + even if `t.center` is too wide for the tearing restriction. + + Beyond just mutable fields in boxed records, this new syntax form extends to + elements of mutable arrays. So if we have a `#pt array` named `arr`, users + could write `arr.(.(3).x) <- #4l` to update one field of an element of the + array. + +### Nesting + +According to the descriptions above, the `#` prefix on types is +*not* recursive. That is, the contents of a `#t` are the same as +the contents of a `t` -- it's just that the `#t` is laid out in +memory differently. Let's explore the consequences of this aspect +of the design by example: + +Consider the definitions (let's assume a 64-bit machine) + +```ocaml +type t0 = { x : int32; y : int16 } +type t1 = { x : #int32; y : #int16 } +type t2 = { t1 : t1; z : #int16 } +type t3 = { t1 : #t1; z : #int16 } +type t4 = #{ t1 : #t1; z : #int16 } +``` + +The points below suggest a calling convention around passing arguments +in registers. These comments are meant to apply both to passing arguments +and returning values, and are up to discussion with back-end folks. + +1. A value of type `t0` will be a pointer to a block of memory with 3 words: a +header, a pointer to a boxed `int32` for `x`, and a word-sized immediate for +`y`. It is passed in one register (the pointer). + +2. A value of type `#t0` will comprise two words of memory: a pointer for `x`, +and an immediate for `y`; it is passed in two registers to a function. + +3. A value of type `t1` will be a pointer to a block of memory with 2 words: +a header and a word containing `x` and `y` packed together (precise layout +within the word to be determined later, and likely machine-dependent). +It is passed in one register (the pointer). + +4. A value of type `#t1` will comprise one word of memory, containing +`x` and `y` packed together. When passed to a function, it will be +passed as *two* registers, one for each component. + +5. A value of type `t2` does not exist: it would be pointer to a *mixed* block, +containing a pointer to a `t1` structure and a non-GC word for `z`. + + **Alternative:** (We do not plan to do this.) A value of type `t2` is a +pointer to a block of memory with three words: a header, a pointer to a `t1` +structure (as described above), and a tagged immediate word, 16 bits of which +store `z`. (In this design, a `z : #int64` would be rejected because there is no +room for the tag.) + +6. A value of type `#t2` comprises two words: one is a pointer to a `t1` +structure, and one contains `z`. (The `z` word does *not* need to be tagged, +as the GC will know not to look at it.) It may *not* be stored in +memory; it is passed in two registers. + +7. A value of type `t3` is a pointer to a block of memory with 2 words: +a header and one packed word containing 48 bits of `t1` and 16 bits of +`z`. It is passed in one register. + +8. A value of type `#t3` comprises one packed word in memory, containing +all of `x`, `y`, and `z`, with no tag bit. It is passed in three registers, +one each for `x`, `y`, and `z`. + +9. The type `t4` is utterly identical to the type `#t3`. + +10. The type `#t4` does not exist; it is an error, as the name `#t4` is +unbound. + +The key takeaway here is that if a programmer wants tight packing, they have to +specify the `#` in the definition of their types. Note that there is no way to +get from the definition of `t0` or `t2` to a tightly packed representation; use +`t1` or `t3` instead! + +A separate takeaway here is that memory layout in our design is *associative*: +the memory layout does not depend on how the type definitions are +structured. This is in contrast to C `struct`s, where each sub-`struct` must be +appropriately aligned and padded. For example, the C translation of this example +is + +```c +struct t3 { + struct t1 { int32 x; int16 y; } t1; + int16 z; +}; +``` + +Yet this would take 2 words in memory, as `t1` would be padded in order to be +32-bit aligned. + +### Further memory-layout concerns + +We wish for unboxed records (and tuples, to which this discussion equally +applies) to be packed as tightly as possible when stored in memory. (This dense +packing does not apply when an unboxed record is stored in local variables, as +it may be more efficient to store components in separate registers.) + +This section of the proposal describes a possible approach to tight memory +packing that supports reordering. This section is more hypothetical than others, +and we type theorists defer to back-end experts if they disagree here. The +section ends with a few user-facing design conclusions. + +The key example is + +```ocaml +type t5 = { t1 : #t1; a : int; z : #int16 } +``` + +Note that now, we have an `a` between the `#t1` and the `#int16`. + +A bad idea would be to tightly pack the `int` against the `#t1`, like this (not +to scale): + + 63 31 0 63 31 0 + xxxxxxxxxyyyyyaaaaaa aaaaaaaaaaaaaaazzzz + +Note that the 64 bits of `a` are spread across *two* words. This would make +operations on `a` very expensive. Just say "no"! + +Instead, we insist that `a` is word-aligned. We might thus imagine + + 63 31 0 63 31 0 63 31 0 + xxxxxxxxxyyyyy000000 aaaaaaaaaaaaaaaaaaa zzzzz0000000000000000 + +where `0` denotes padding. That works, but it's wasteful. Instead, we do + + 63 31 0 63 31 0 + xxxxxxxxxyyyyyzzzzzz aaaaaaaaaaaaaaaaaaa + +Everything is aligned nicely, and there's no waste. The only drawback is that +we have *reordered*. + +In general, we reserve the right to reorder components in an unboxed tuple or +record in order to reduce padding. With demand, we could imagine introducing +a way for programmers to request that we do not reorder. (An unboxed type that +does not participate in reordering would have a different layout from one that +does participate in reordering, meaning essentially a new layout former, such +as `&`, analogous to `*`.) + +However, the reordering is *not* encoded in the layout system. Imagine now + +```ocaml +type t6 = { t1 : #t1; z : #int16; a : int } +``` + +This `t6` is the same as `t5`, but with fields in a different order. We have + +```ocaml +t5 : (bits32 * bits16) * immediate * bits16 +t6 : (bits32 * bits16) * bits16 * immediate +``` + +Accordingly, a type variable that ranges over `t5` would not also be able to +range over `t6`: the two have *different* layouts. We could imagine a very +fancy layout equivalence relation that would detect the reordering here and +say that `t5`'s layout equals `t6`'s; we choose not to do this, for several +reasons: + +* Encoding reordering in the layout system potentially constrains us in the + future, in case we wish to do fewer reorderings. +* This significantly complicates layouts, for little perceived benefit. +* Different architectures may benefit from different reorderings; we do not want + the type system to depend on the architecture. + +Accordingly, the reordering of physical layout is mostly undetectable by users: +they just get a more compact running program. The way to detect the reordering +is by either inspecting memory manually (of course) or by sending unboxed +records through a foreign function. In order to support foreign functions, we +will add an interaction with `ocamlc` that produces a C header that offers +functions that extract and set the various fields of an unboxed tuple. Foreign +code could then use this header to be sure that it interacts with our reordering +algorithm correctly. + +#### Backward compatibility + +Existing OCaml programs may have foreign interfaces that rely on a certain +ordering of record fields. The reordering story need not disrupt this. To wit, +we promise that, as we work out the details of reordering, we *never* reorder +fields in records (or tuples) where all fields have layout `value`. (This +includes *all* types expressible before our change.) + +Going further, and imagining possibly lifting the mixed-block restriction, we +can imagine the following rule: + +* (Hypothetical) In any record or tuple type, its fields are ordered in memory + with all `value`s first, in the order given in the declaration, followed by + non-`value`s, in an implementation-defined order. + +This rule actually contradicts the layout of `t5` above, which would put `a` (a +`value`, because `immediate`s are `value`s) first. However, we make no promises +about the hypothetical rule and include it here just as a possible way forward. + +## Unboxed variants + +Unboxed variants pose a particular challenge, because the boxed +representation is quite different than the unboxed representation. + +Consider + +```ocaml +type t = K1 of int * bool | K2 of #float * #int32 | K3 of #int32 | K4 +``` + +An unboxed representation of this would have to essentially be like a C +union with a tag. This particular `#t` would use registers as follows for +argument passing: + +```ocaml +bits2 (* tag information *) +immediate (* for the int *) +immediate (* for the bool *) +float64 (* for the #float *) +bits32 (* for both #int32s, shared across constructors *) +``` + +No matter which variant is chosen, this register format works. This is important +because the layout of an unboxed type must describe its register +requirements, regardless of what constructor is used. We thus use a new +layout former to describe unboxed variants, `+`. That is, the layout of `t` would +be `(immediate * immediate) + (float64 * bits32) + bits32 + void`. +By specifying only the +layouts of the variants' fields -- not the actual register format itself -- we leave +abstract the actual algorithm for determining the register format. +(See "Aside" below for discussion on this point.) + +One challenge in working with unboxed variants is that it may be hard for the +programmer to predict the size of the unboxed variant. That is, a programmer +might have a variant `v` and then think that `#v` will be more performant than +`v`. However, `#v` might be wider than even the widest single variant of `v` and +thus actually be *less* efficient. (Of course, this is always true: we should +always test before assuming that e.g. unboxing improves performance.) As we +implement, we should keep in mind producing a mechanism where programmers can +discover the memory layout of an unboxed variant, so they can make an informed +decision as to its usage. + +### Boxing + +In contrast to the fixed register format above, a boxed variant needs only +enough memory to store the fields for one particular constructor. That's because +boxed variants get allocated specifically for one constructor -- there is no +requirement that all variants have the same size and layout. + +Despite the challenges here, `box` can still work to convert an unboxed variant +to a boxed one: `box` simply understands the `+` layout form to mean +alternatives, +just as variants have always worked. + +We thus extend the treatment of unboxed records to work analogously +for unboxed variants. That is, we treat a definition of a boxed variant + +```ocaml +type t = K1 of ty1 | K2 of ty2 | ... +``` + +to really mean + +```ocaml +type #t = #( K1 of ty1 | K2 of ty2 | ... ) +type t = #t box +``` + +We then additionally add magic to `box` to make this change transparent to users: + +* A boxed unboxed variant may be constructed and pattern-matched against as +if the box were not there. Following our example, `K1` and `K2` could construct +values of type `t`, and values of type `t` could be matched against constructors +`K1` and `K2`. + +Depending on how an unboxed variant ends up in memory, it has one of three possible +representations: + +* If an unboxed variant is `box`ed, then it has the same representation as + today's variants: a block including a tag and the fields of + the particular constructor (only). + +* If an unboxed variant is part of a boxed product (i.e. record, tuple, or array), + then it takes up exactly as much space as needed to store the tag and the + widest constructor. Here is the example: + + ```ocaml + type t = K1 of int * string * string | K2 of bool * bool | K3 of string + type t2 = { f1 : float; f2 : #t; f3 : string } + ``` + + A value of type `t2` will be a pointer to a block arranged as follows: + + ``` + word 0: header + word 1: f1, a pointer to a boxed float + word 2: tag for f2 + word 3: K1's int (immediate) or K2's bool (immediate) or K3's string (pointer) + word 4: K1's string (pointer) or K2's bool (immediate) + word 5: K1's string (pointer) + word 6: f3, a pointer to a string + ``` + + Note that the `#t` takes up 4 words, regardless of the choice of + constructor. This fixed-width representation is needed in order to give `f3` + a fixed offset from the beginning of the record, which makes accesses of + `f3` more efficient. Imagining a variable-width encoding requires examining + the tag of the variant in order to address later fields; this becomes + untenable if a record has multiple inlined variants. (We can think of `#t` + here more as an inlined variant than an unboxed one.) + +* If an unboxed variant is part of an outer variant, we essentially inline the + inner unboxed variant. The section on "Nesting", below, covers this case. + +### Constructor names and disambiguation + +Just as we have done for record fields, we consider the constructor for the +boxed variant to shadow the constructor for the unboxed one. That is, writing +`K1 blah` will construct a `t`. However, if we already know that the expected +type of `K1 blah` is `#t`, then we use type-directed disambiguation to discover +that the `K1` is meant to refer to the unboxed variant, not the boxed +one. Echoing the design for record fields, we can use a `#` prefix to +disambiguate manually: `#K1 blah` unambiguously creates a `#t`. + +### Nesting + +We can naturally nest unboxed variants, just like we did with unboxed records. +Just as before, the `#` mark is *not* recursive. Also just as before, we can +gain extra efficiency by cleverly packing one variant inside of another. Let's +explore an example: + +```ocaml +type t = K1 of int * bool | K2 of #float * #int32 | K3 of #int32 | K4 + (* as above *) +type t2 = K2_1 of #int32 | K2_2 of #float * #float | K2_3 of #t +``` + +The `t` is unedited from above, but the `t2` definition contains a `#t`. + +We imagine the following register format for `t2`: + +```ocaml +bits3 (* for the combined tag *) +immediate (* for K1's int *) +immediate (* for K1's bool *) +float64 (* for K2's #float and K2_2's first #float *) +float64 (* for K2_2's second #float *) +bits32 (* for K2's, K3's, and K2_1's #int32s #) +``` + +A few observations to note about this format: + +* There is no part of this arrangement that matches the arrangement for `t`; +indeed, it is not necessary to efficiently get from a `t2` to a `t`: when the +user asked to unbox `t` in the definition of `t2`, they gave up on this +possibility. +* If the `#t` component of `K2_3` is, say, passed to another function, its + components will have to be copied into new registers, so that the function can + extract the pieces it needs. +* If the `#t` component of `K2_3` is, say, matched against, the matcher can + simply remember an offset when looking at the tag. In our case, we might + imagine that tags `0` and `1` correspond to `K2_1` and `K2_2`, with tags `2` + through `5` corresponding to the constructors of `t`. Then, a match on `t` + would simply subtract 2 from the tag before comparing against `t`'s + constructors. + +Note that the description above around boxing applies to nested variants, too, +because the layout of `#t2` is + +```ocaml +bits32 + (float64 * float64) + (immediate * immediate) + + (float64 * bits32) + bits32 + void + ``` + +. Each summand is interpreted as a variant by `box`, and thus the in-memory +representation does not need to take any extra space. Spelling this out, a +`#t2 box` would be either be the tagged immediate for `0` (representing +`K2_3 K4`) or a pointer to a block containing a header and the following fields + + tag | fields | represents + =================================================== + 0 | bits32 | K2_1 x + 1 | float64 ; float64 | K2_2 (x, y) + 2 | immediate ; immediate | K2_3 (K1 (x, y)) + 3 | float64 ; bits32 | K2_3 (K2 (x, y)) + 4 | bits32 | K2_3 (K3 x) + +As in the discussions above about low-level details: it's all +subject to change. The key observation here is that unboxing a variant +effectively inlines it within another variant. Put another way, we want the +concrete use of memory and registers for a type to be unaffected by the choice +of how the type is structured (that is, ignore the associativity choices of `+`, +much like we have ignored the associativity choices of `*`). + +### Aside: Why we have a fresh layout for unboxed variants + +The section above describes a fresh layout constructor `+` for unboxed +variants. However, in memory, an unboxed variant is laid out just like +an unboxed tuple, so we could, in theory, use the same layout constructor +for both. Under this idea, the `t` example above would have layout +`immediate * immediate * float * int32 * int2`. + +However, we do not adopt this design for (at least) these reasons: + +1. Encoding the unboxed-variant layout algorithm in the type system +seems fragile in the face of possible future improvements/changes. Maybe +we can come up with a better way of packing certain fields in the future, +and it would thus be a shame if such a change broke layout checking. + +2. An unboxed product type of that kind can reasonably be the +type of a mutable record field. However, the variant case +can't. That's because racing writes of an unboxed variant type +risk memory safety -- the fields from one constructor might +win the race, whilst the tag from the other constructor wins. + +3. The most natural representation for including an +unboxed sum type within a boxed sum type is actually to add +aditional constructors to the boxed sum type -- dual to how +unboxing a product into a product adds additional fields. In +other words, we would store the tag of the nested unboxed sum +type as part of the tag of the enclosing boxed sum type. + +That representation also gives an additional excuse for not +allowing mutable unboxed variant fields. + +Note that the kind of representation described above is still +available using GADTs to make the tag fields explicit. For example: + +```ocaml +type foo = { a : int; + unused1 : unit; + unused2 : unit; } + +type bar = { b : float; + c : string; + unused : unit; } + +type baz = { d : string; + e : string; + f : string; } + +type ('a : value * value * value) tag = + | Foo : #foo tag + | Bar : #bar tag + | Baz : #baz tag + +type t = + T : { g : float; + tag : 'a tag + data : 'a } -> t +``` + +## Polymorphism, abstraction and type inference + +The design notes below here are about the details of making unboxed +types play nicely with the rest of the type system. + +The key problem is that we must now somehow figure out the layouts +of all type variables, including those in type declarations and +in value descriptions. The **Availability** principle suggests that +we should infer layouts as far as is possible. Yet the **Backward +compatibility** principle tells us that we should prefer the +`value` layout over other layouts when there is a choice. + +We thus operate with the following general design: + +* The layout of a *rigid* variable is itself rigid: it must be supplied +at the type variable's binding site. + +* The layout of a *flexible* (unification) variable is itself flexible: +it can be inferred from usage sites. +If we still do not know the layout of a variable when we have finished +processing a compilation unit, we default it to `value`. + +(Why wait for the whole compilation unit? Because we might get critical +information in an mli file, seen only at the very end of an entire file.) + +We retain principal *types*, but we don't have principal *layouts*: +for any expression, there's a best type for any given layout, but +there's no best layout (or at least, the best layout isn't always +compilable). + +Programmers may also specify a layout, using the familiar `:` syntax +for type ascriptions. A layout may be given wherever a type appears. +For example, these declarations are accepted: + +```ocaml +type ('a : value, 'b : immediate) t = Foo of 'a * 'b +val fn : ('a : immediate) . 'a ref -> 'a -> unit +val fn2 : 'a ref -> ('a : immediate) -> unit (* NB: not on declaration of 'a *) +val fn3 : (unit : immediate) -> unit (* NB: layout ascription on type, not variable *) +type ('a : bits32 * bits32) t2 = #{ x : #float; stuff : 'a } +val fn4 : 'a t2 -> 'a t2 (* layout of fn4 is inferred *) + +type t5 : bits32 (* layout ascription on an abstract type *) + +val fn5 : 'a -> 'a (* NB: 'a is defaulted to have layout `value` *) +``` + +It may seem a little weird here to allow abstraction over types of +layout `bits32`. After all, there's only one such type (`#int32`). +However, this ability becomes valuable when combined with abstract +types: two different modules can expose two different abstract types +`M.id`, `N.id`, both representing opaque 32-bit identifiers. By +exposing them as abstract types of layout `#int32`, these modules can +advertise the fact that the types are represented in four bytes +(allowing clients of these modules to store them in a memory-efficient +way), yet still prevent clients from mixing up IDs from different +modules or doing arithmetic on IDs. + +### Multiple aspects of a layout + +A concrete layout actually encodes three different properties of a type: how +the type is represented in memory (how many bits the type needs +and what kind of register should hold the value), whether the GC will +be able to successfully inspect values of the type (is the type `gc_friendly`?), +and whether the GC can ignore values of the type (is the type `gc_ignorable`?). + +Here are the possible representations: + +``` +value_rep +word_rep +void_rep +bits8_rep +bits16_rep +bits32_rep +bits64_rep +float_rep +rep1 * rep2 +rep1 + rep2 +``` + +We now define several meta-functions, defined only over *concrete* layouts: + +``` +rep : concrete_layout -> representation +rep(value) = value_rep +rep(immediate) = value_rep +rep(bits_n) = bits_n_rep +rep(word) = word_rep +rep(float64) = float_rep +rep(lay1 * lay2) = rep(lay1) * rep(lay2) +rep(lay1 + lay2) = rep(lay1) + rep(lay2) + +gc_friendly : concrete_layout -> bool +gc_friendly(value) = true +gc_friendly(immediate) = true +gc_friendly(void) = true +gc_friendly(lay1 * lay2) = gc_friendly(lay1) && gc_friendly(lay2) +gc_friendly(lay1 + lay2) = gc_friendly(lay1) && gc_friendly(lay2) +gc_friendly(_) = false + +gc_ignorable : concrete_layout -> bool +gc_ignorable(value) = false +gc_ignorable(lay1 * lay2) = gc_ignorable(lay1) && gc_ignorable(lay2) +gc_ignorable(lay1 + lay2) = gc_ignorable(lay1) && gc_ignorable(lay2) +gc_ignorable(_) = true +``` + +These meta-functions respect subtyping among concrete layouts, as stated in the +following properties: + +* If `l1 <= l2`, then `rep(l1) = rep(l2)`. +* If `l1 <= l2`, then `gc_friendly(l2) ⇒ gc_friendly(l1)` (where `⇒` denotes + logical implication). +* If `l1 <= l2`, then `gc_ignorable(l2) ⇒ gc_ignorable(l1)`. + +In addition, these definitions allow us to derive the mixed-block restriction. +It would definitely be problematic to have a layout that +is both not gc-friendly and not gc-ignorable: the GC can't ignore it, but also +can't look at it. No primitive layouts have this undesirable property. But mixed +blocks do! And so it had better be the case that we never build them. + +### Unification of two types with different layouts + +Every concrete type has a most accurate (i.e. lowest) layout. I +think it is reasonable in this system to consider this as its +"true" layout. A statement of the form: + +``` +t : l +``` + +Does not say that type `t` has `l` as its true layout, merely +that `l` is an upper bound on the true layout of `t`. In +particular, a unification variable with kind `l`: + +``` +'a : l +``` + +stands for some type that we haven't worked out yet, whose true +layout has `l` as an upper bound. + +That's why, if we have: + +``` +'a : la +'b : lb +'a = 'b +``` + +then, even though the true layout of `'a` must be equal to the +true layout of `'b`, it is not the case that `la` must be equal +to `lb`. What is true is that the true layout of `'a` and `'b` must +be a sublayout of both `la` and `lb`. That means it must be a +sublayout of the greatest lower bound of `la` and `lb` -- +written `la ⊓ lb`. Thus a most general solution for these +constraints is to have: + +``` +'c : la ⊓ lb +``` + +and substitute `'c` for both `'a` and `'b`. + +(Equivalently, we can substitute `'a` for `'b` and replace `'a`s + layout with `lb ⊓ la`, which is how we actually implement + it. But it is simpler to think of the layout of a + variable as immutable, and think of variable-variable + unification as using a third variable.) + +`lb ⊓ la` is commutative and defined in the following cases (omitting symmetric +cases): + * If `lb` is `any`, then `lb ⊓ la = la`. + * If `lb` is `gc_friendly` and `la <= gc_friendly`, then `lb ⊓ la = la`. + * If `lb` is `gc_ignorable` and `la <= gc_ignorable`, then `lb ⊓ la = la`. + * If `lb` is `gc_friendly` and `la` is `gc_ignorable`, then `lb ⊓ la = void`. + * Otherwise, we require `rep(lb) = rep(la)`. Assuming `rep(lb) = rep(la)`: + * If `lb = la`, then `lb ⊓ la = la`. + * If `lb = value`, then `lb ⊓ la = la`. + +(Correctness condition: for all layouts `lb` and `la`, `lb ⊓ la <= lb`, `lb ⊓ la +<= la`, and for all layouts `lc` such that `lc <= lb` and `lc <= la`, `lc <= lb +⊓ la`.) + +(Note that if `immediate <= word`, `gc_friendly ⊓ gc_ignorable` would not +exist and this algorithm would need to change. Furthermore, the definition would +become incomplete, because `value ⊓ word` *would* exist -- it would be +`immediate` -- and yet the definition above would not compute this.) + +Because of the `rep(lb) = rep(la)` restriction, we must track this as a new form +of constraint, arising whenever we must compute `lb ⊓ la`. + +Similarly, if we have: + +``` +'a : la +'a = t +``` + +We can solve that by substituting `t` for `'a`, but this generates +an additional constraint that: + +``` +t : la +``` + +This is equivalent to: + +``` +l <= la +``` + +where `l` is the best (lowest) layout we have available for `t`. + +The key to handling these additional forms of constraints is the +observation that: When creating a unification variable for a type we +are inferring, there is no requirement that we also infer its true +layout. We are already inferring the type, and the type determines +what the true layout is anyway. Instead we only need to provide an +appropriate upper bound on that layout. + +This means that we do not need unification variables that range +over indeterminate layouts. Instead +we can just assign fresh type unification variables the upper bound +`any`: + +``` +'a : any +``` + +Note that, given such an `'a`, we can solve any constraint of the +form + +``` +'a = t +``` + +by substituting `t` for `'a` because + +``` +t : any +``` + +holds for all `t`. + +Similarly, we do not actually need unification variables that +range over concrete layouts +either. Instead we can use unification variables that range over +*representations*. When we know that a type needs to have a concrete layout +it is sufficient to create a type unification variable the +following upper bound: + +``` +'a : upper_bound('r) +``` + +where 'r is a representation variable and `upper_bound(r)` is a function that +maps a representation to the layout that is the upper bound of all the layouts +with that representation: + +``` +upper_bound(value_rep) = value +upper_bound(word_rep) = word +upper_bound(bits_n) = bits_n +upper_bound(float_rep) = float64 +upper_bound(rep1 * rep2) = upper_bound(rep1) * upper_bound(rep2) +upper_bound(rep1 + rep2) = upper_bound(rep1) + upper_bound(rep2) +``` + +(Correctness property: for all concrete layouts `l`, `l <= upper_bound(rep(l))`.) + +This restricts the type to have a concrete layout, whilst still allow +its representation to be discovered by inference. + +With all this in place, we +can now trivially solve all those additional constraints. + +``` +rep(upper_bound(ra)) = rep(upper_bound(rb)) +``` + +is solved by just: + +``` +ra = rb +``` + +and + +``` +l <= upper_bound(r) +``` + +is solved by: + +``` +l ≠ any /\ rep(l) = r +``` + +meaning that the only layout related non-trivial constraints in +the system are equality constraints on representation variables, which can +be handled by ordinary undirected unification. + +### Examples + +When we invent a new unification variable with an unknown layout, +how should we discover its layout? We must be careful, because some +unification variables will be required to have known, concrete layouts; +while others will be allowed to have `any`. Let's look at +some examples: + +```ocaml +let f x = x +let rec g : unit -> 'a -> 'a = fun () -> g () +``` + +In `f`, we guess the type for `x` to be some type `'a`. We also +must create a *representation unification variable* `'r`, where +`upper_bound('r)` +is the layout of `'a`. The codomain of the `upper_bound` meta-function +is only the concrete layouts, so we are guaranteed to infer a concrete +layout. + +In `g`, we must also infer the layout for `'a`. However, here, +because there is no place whether the layout is restricted to be +concrete, we just say `'a : any`. If something were to constrain +the layout, we would end up unifying `'a` with a type with a more +restrictive layout. + +Let's now return to `f` and `g` and see how this all plays out: + +`f`: + +1. We spot that `f` is a function of one variable. We thus say +that `f : 'a -> 'b` for fresh variables `'a` and `'b`. We assign +layout `upper_bound('ra)` to `'a` and `any` to `'b`, +for a fresh unification variable `'ra`. Note that, because +`'b` is the return type, we do not require a concrete layout for it. + +2. We see that the return value is `x`. This forces us to unify +`'b` with `'a`. We now have this unification problem: + + ``` + ('b : any) ~=~ ('a : upper_bound('ra)) + ``` + +As described above, we solve this by inventing +`'c : any ⊓ upper_bound('ra)` and filling both `'b` and +`'a` with `'c`. The layout here quickly simplifies to `upper_bound('ra)`. + +3. At this point there is no more information about `f` we can +use. We see it has type `'a -> 'a`, where `'a : upper_bound('ra)`. +Because `'a` +is not free in the environment (or, equivalently, it has a level +higher than the surrounding context) we can generalize it. But +we still have a problem with `'ra` and `'ga`; our type system does *not* +support layout polymorphism, so we cannot generalize these variables. Instead, +we make them into weakly polymorphic variables. When we finish +processing the top-level definition containing `f` (which might just +be `f` itself), we will default `'ra := word` and `'ga := must_gc` if we have not already +unified them. + +The final type for `f` is thus `('a : value). 'a -> 'a`, just as it +always has been. + +`g`: + +1. We have a type signature for `g`, but it mentions a unification +variable `'a`. We choose `'a : any`. + +2. Processing the body of `g` doesn't yield anything interesting, +and so we assign `g` the type `('a : any). unit -> 'a -> 'a`, +which is strictly more general than the type `g` is assigned today. + +### Layout checking + +Thanks to how types are constructed in OCaml, we need no separate +well-formedness check. Suppose we're given the following type: + +```ocaml +type ('a : immediate) t = ... +``` + +and the user attempts to write `string t`. The syntax `string t` is +converted to a type expression by first constructing a template with a +fresh unification variable `'b t`, and then unifying `'b` with +`string`. Here, since `'b` ranges over layout `immediate`, the +unification with `string` will fail. + +# Extensions and other addenda + +## Alternative idea: `#` as an operator + +One alternative idea we considered was to make `#` a (partial) +operator on the type algebra, computing an unboxed version of +a type, given its declaration. This ran aground fairly quickly, +though. + +One problem is that OCaml doesn't have the ability to extract +a type argument from a composite type (like to get the `int` +from `int list`). So viewing `#` as the inverse of `box` doesn't +quite work. (Richard thought for a little bit that +`type 'a unlist = 'b constraint 'a = 'b list` did this unwrapping, +but it doesn't: it just requires that the argument to `unlist` be +statically a `list` type; this is not powerful enough for our needs, +because it would reject `'a. ... # 'a ...`, which would be part +of this more general operator's purview.) + +Another problem is that `#` would have to have an elaborate type: +it would need to be something like `# : Pi (t : value) -> t layout_of_unboxed_version_of`, +necessitating the definition of `layout_of_unboxed_version_of`. +This is a dependent type. Dependent types are cool and all, but +unboxed types seem like a poor motivation for them. + +## Extension: Restrictions around unexpected copy operations + +Today, OCaml users expect a line like `let x = y` to be quick, copying a word of +data and requiring ~1 instruction. However, if `y` is a wide unboxed record, +this assignment might take many individual copy operations and be unexpectedly +slow. The same problem exists when an unboxed record is passed as a function +argument. Perhaps worse, this kind of copy can happen invisibly, as part of +closure capture. Example: + +```ocaml +type t = { a : int; b : int; c : int; d : int; e : int } + +let f (t : #t) = + let g n = frob (n+1) t in + ... +``` + +Building the closure for `g` requires copying the five fields of `t`. + +We could imagine a restriction in the language stopping such copies from +happening, or requiring extra syntax to signal them. However, we hope that there +will not be too many wide unboxed records, and copying a few fields really isn't +so slow. So, on balance, we do not think this is necessary. + +## Extension: limited layout polymorphism. + +We can imagine an extension where +the layout of `(p, q) r` could in general depend on +the layouts of `p` and `q`. For instance: + +```ocaml +type ('a : value) id = 'a +``` + +Since `'a : value`, we know that `'a id` is always of layout `value`, +but if `'a = int` it may also have the more specific layout +`immediate`. + +For each type constructor `(_, _) r` in the environment, we store +along with it a layout `K` which is an upper bound for the layout of +any instance of `r`. (For `'a id` above, this is `value`). When +checking `(p, q) r : L`, we compare `L` and `K`. If `K` is a subtype +of `L`, then the check passes. However, if the check fails and `r` is +an alias, we must expand `r` and try again. We only report a +unification failure once the check fails on a type which is not an +alias. + +For example, suppose we have the unification variables `'a : value` +and `'b : immediate`, and we're trying to unify `'a id = 'b`. We need +to check `'a id : immediate`, but the environment tells us that the +layout of `_ id` is `value`. So, we expand `'a id` to `'a` and retry +unification, which succeeds after unifying `'a` and `'b` (the +resulting type variable has layout `immediate`). The effect here is +that after unifying `'a id` with `'b`, we leave `'a` as an +uninstantiated unification variable, but its layout is restricted to +`immediate`, to match `'b`. + +## Extension: more layout inference + +The current design describes that we require users to tell us the +layout of rigid variables but that we will infer the layout of +flexible variables. However, it is possible to do better, inferring +the layouts even of rigid variables. For example: + +```ocaml +let f1 : ('a : float). 'a -> 'a = fun x -> x +let f2 : 'a. 'a -> 'a = fun y -> f1 y (* 'a inferred to be a float *) + +module type T = sig + type ('a : float) t1 + type t2 + type t3 = t2 t1 (* this forces t2 : float *) +end +``` + +This can be done simply by using unification variables for the layouts +of even rigid type variables; if we don't learn anything interesting +about the layout within the top-level definition, we default to +`value`, as elsewhere. + +## Extension: more equalities for `box` + +It would be nice to add this rule to the `box` magic: + +* If `ty : value`, then `ty box = ty`. There is no reason to box something +already boxed. Note that this equality depends only on the *layout* of +`ty`, not the `ty` itself. For a given layout of the argument, `box` is +parametric: it treats two types of the same layout identically. + +This is hard, because at the time we see `ty box`, it might be the case +that we haven't yet figured out the layout for `ty`, meaning we'd have +to continue inference, perhaps figure out the layout, and then return +to `ty box` and perhaps simplify it to `ty`. This is possible (GHC +does it), but it would be breaking new ground for OCaml, and so should +be strongly motivated. So we hold off for now, noting that adding this +feature later will be backward compatible. + +## Extension: disabling magic for `box` + +The main proposal describes that a declaration like + +```ocaml +type t = { x : int; y : float } +``` + +would really be shorthand for + +```ocaml +type #t = #{ x : int; y : float } +type t = #t box +``` + +and that the expression e.g. `{ x = 5; y = 3.14 }` is shorthand +for `box #{ x = 5; y = 3.14 }`, along with similar shorthand for +record projection and variants. + +However, we could have a flag, say `-strict-boxes`, that removes +the auto-boxing and auto-unboxing magic. This would make record-construction +syntax like `{ x = 5; y = 3.14 }` an error, requiring a manual call +to `box` to package up an unboxed record. The advantage to this +is that programmers who want to be aware of allocations get more +compiler support. A downside (other than the verbosity) is that `-strict-boxes` +would still not prevent all allocation: any feature that doesn't go +via `box` might still allocate, including closures. + +## Extension: `immediate <= word` + +Semantically, `immediate <= word` makes good sense. To see why, +we start by observing we can collapse the `value_rep` and `word_rep` +representations: after all, both `value`s and `word`s take up the +same space in memory. (Their relationship to the garbage collector +is different, but we track that separately, not in the representation.) +We can +see this in the following table: + +``` + | representation | gc_friendly | gc_ignorable + ----------+----------------+-------------+------------- + immediate | word_rep | true | true + value | word_rep | true | false + word | word_rep | false | true +``` + +Just as `immediate <= value` works, `immediate <= word` works: +`immediate` is really the combination of `value` (with its +gc-friendliness) and `word` (with its gc-ignorableness). + +However, we hold off on adding `immediate <= word` to our type +system for now, because it adds complication without anyone +asking for it. + +Here are some complications introduced: + +* It is very convenient to reduce computing +`la ⊓ lb` to a constraint `rep(la) = rep(lb)`. In order to support `value ⊓ word += immediate`, we would thus drop the `value_rep` representation, setting +`rep(value) = word_rep`. + +* The `upper_bound` function we use in type inference then becomes impossible to +define: neither `value` nor `word` is an upper bound for the other, and so +`upper_bound(word_rep)` has no definition. We thus have to design a different +type inference algorithm than the one described above, likely tracking +representation, gc-friendliness, and gc-ignorability separately. This would not +be all that hard, but it's definitely more complicated than the algorithm +proposed here. + +* We sometimes might infer `immediate` even when no function +nearby has anything of layout `immediate`: + + ```ocaml + val f1 : ('a : value). 'a -> ... + val f2 : ('a : word). 'a -> ... + + let f3 x = (f1 x, f2 x) + ``` + + Here, we need to infer that the `x` in `f3` has layout + `immediate`, even though neither `f1` nor `f2` mentions + `immediate`. This might be unexpected by users. + +## Further discussion: mutability within unboxed records + +This proposal forbids mutability within an unboxed record (unless that +record is itself contained within a boxed record). This decision is made to +reduce the possibility of confusion and bugs, not because mutability within +unboxed records is unsound. Here, we consider some reasons that push against the +decision above and include some examples illustrating the challenges. + +Reasons we might want mutability within unboxed types: + +* The designer of a type may choose to have some fields be mutable while + other fields would not be. For example, our `#mut_point` type allows the + location of the point to change, while the identity of the point must + remain the same. + +* OCaml programmers expect to be able to update their mutable fields, and it + may cause confusion if a programmer cannot, say, update the `x` field of a + `#mut_point`. + +The reason we decided not to allow mutability within unboxed types is because of +examples like this: + +```ocaml +type mut_point = { mutable x : #int32; mutable y : #int32; identity : int } +let () = + let p : #mut_point = #{ x = #4l; y = #5l; identity = 0 } in + let foo () = print_uint32 p.x in + foo (); + p.x <- #10l; + foo () +``` + +Because the closure for `foo` captures that unboxed record `p` (by copying `p`), +this code prints 4 twice, rather confusingly. Yet the current design does not +fully rule such a confusion out, as we see here: + +```ocaml +type point = { x : int; y : int } +type t = { mutable pt : #point } + +let () = + let t : t = { pt = #{ x = 5; y = 6 } } in + let pt = t.pt in + let foo () = print_int pt.x in + foo (); + t.(.pt.x) <- 7; + foo () +``` + +This will print 5 twice, because the unboxed `t.pt` is copied into the local +variable `pt`. Yet the boxed version of this (with a `mutable x` and no hash +marks) prints 5 and then 7. This is a key motivation for putting the parentheses +in the `t.(.pt.x)` syntax: it forces the programmer to think about the fact that +they are mutating `pt.x`, not just `x`. Accordingly, they should expect `pt` not +to be changed (even if `pt` were an alias -- which it's not). Without the +parentheses there, a user might think they're mutating `x` and be surprised. + +Here's yet another example: + +```ocaml +type t2 = { x : #int32; y : #int32 } +type t1 = { mutable t2 : #t2; other : int } + +fun (t1 : t1) -> + let other_t1 : t1 = {t1 with other = 5} in + t1.(.t2.x) <- #10l; + other_t1.t2.x = t1.t2.x +``` + +Assuming `t1.t2.x` doesn't start as `#10l`, this will return `false`. But this +should not come as a surprise: `t1.t2` is copied as part of the functional +record update, and thus mutating `t1.t2` (as part of `t1.(.t2.x) <- #10l`) changes +it. If, on the other hand, we did not have the parentheses, users might expect +`other_t1`'s `x` to be changed when `t1`'s `x` gets changed. + +## Bad idea: width subtyping + +We wondered at one point whether we could support `int8 <= int64` in +the subtyping relation. After all, a function expecting an `int64` +argument can indeed deal with an `int8` one. The problem would be +collections. That is, if we have an `#int8 array` (for a magical +`array` type that can deal with multiple layouts, like `box`), how +would we index into it? Is it a `#int8`-as-64-bits or a plain `#int8` +in there? The way to tell is to have a second argument to `array` +that denotes the layout. Richard thinks this could work out in the +end, but probably would still be a bad idea. + +## Bad idea: splitting `box` + +The mixed-block restriction tells us that there are really two forms of `box`: +one for gc-friendly boxes and one for gc-ignorable boxes. We briefly thought +about having two different primitives `box_gc` and `box_non_gc` (and no `box` +function), where users would choose the one they wanted. The type system would +ensure that the layout of the argument is appropriate for the type of box +created. + +This plan would be good because it would simplify inference: currently, every +use of `box` imposes a requirement that the layout of the argument is *either* +gc-friendly or gc-ignorable. But "or" requirements are annoying: we can't +simplify or usefully propagate the requirement until we know which branch of the +"or" we wish to take. If the user specified which kind of box they want, then +we don't have the "or" any more. + +However, this plan doesn't work very well, because of auto-boxing. That is, if +we have something like + +```ocaml +let x = ... in +let y = ... in +x, y +``` + +what kind of box should we use when constructing the tuple? It depends on the +type of `x` and `y` -- or rather on the layouts of those types. At the point +where our boxing magic inserts the call to `box`, we haven't yet completed type +inference, and so we can't know which kind of `box` to insert. In the end, we've +just re-created the challenge with "or" described above. So there seems to be +little incentive to have the two different boxes explicitly anywhere. From f57a82a99d68c350e11ce7ce33bc3c94e25beeb8 Mon Sep 17 00:00:00 2001 From: Richard Eisenberg Date: Fri, 28 Oct 2022 08:32:18 -0400 Subject: [PATCH 2/2] Fix a few bugs; mention unboxed attribute red herrings --- rfcs/unboxed-types.md | 51 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) diff --git a/rfcs/unboxed-types.md b/rfcs/unboxed-types.md index 57839af..955f050 100644 --- a/rfcs/unboxed-types.md +++ b/rfcs/unboxed-types.md @@ -1,5 +1,23 @@ # Unboxed types in OCaml +This proposal is discussed at this [pull request](https://github.com/ocaml/RFCs/pull/34). + +OCaml currently has two attributes that this proposal may easily be confused +with: + +* We can mark arguments to external calls as `[@unboxed]`, as [documented in the + manual](https://v2.ocaml.org/manual/intfc.html#s%3AC-cheaper-call). This + proposal essentially takes this idea and expands it to be usable within OCaml + itself, instead of just at the foreign function interface boundary. +* We can mark some types as `[@@unboxed]`, as briefly described in a bullet + point in [this manual + section](https://v2.ocaml.org/manual/attributes.html#ss:builtin-attributes). + This attribute, applicable to a type holding one field of payload (either a + one-field record or a one-constructor variant with one field), means that no + extra box is allocated when building values of that type. It is notionally + separate from this proposal, though there is an interaction: see the section + mentioning `[@@unboxed]` in its title. + ## Motivation Suppose you define this type in OCaml: @@ -175,6 +193,35 @@ namespaces) and are in scope in the initial environment. Though this proposal does not introduce syntax to do so, we can imagine constructs defining new names in the layout namespace that could shadow these existing names. +### Layouts and the `[@@unboxed]` attribute + +Because a layout describes how a type is stored in memory and passed to and from +functions, an `[@@unboxed]` type must have the same layout as its field +type. That is, in both of the following declarations, `outer` is assigned the +same layout as `inner`: + +```ocaml +type outer = { f : inner } [@@unboxed] +type outer = K of inner [@@unboxed] +``` + +Without the `[@@unboxed]` attribute, both `outer`s would be `value`s. It is +possible that recursion prevents us from finding the layout of the right-hand +side: + +```ocaml +type loopy = { f : loopy } [@@unboxed] +``` + +In this case, we default the layout of `loopy` to `value`, although a programmer +can also write a layout annotation to choose a different layout, as follows: + +```ocaml +type immediate_loopy : immediate = { f : immediate_loopy } [@@unboxed] +``` + +Both `loopy` and `immediate_loopy` are uninhabited. + ## Unboxed types and records with unboxed fields All of the currently-existing types in OCaml have layout `value`. Some @@ -562,8 +609,8 @@ an unboxed and boxed version of an abstract type, you can with ```ocaml module M : sig - type unboxed_t - type t = box unboxed_t + type unboxed_t : value * value + type t = unboxed_t box end = struct type t = { x : int32; y : int32 } type unboxed_t = #t