Skip to content
Matt Basta edited this page Feb 17, 2015 · 3 revisions

Objects on the heap all share common characteristics. Each allocated object--regardless of the type--will allocate the sum of its components' sizes plus eight bytes. The first four of those eight bytes are used for storing an identifier corresponding to the object's type. The second four bytes are used for storing the reference count of the object. Both are stored as 32-bit unsigned integers.

For each object, the shape is determined by a number of factors. Each object reorders its members using a stable sort by size, starting with the largest member and ending with the smallest. This is done to minimize waste caused by inefficient object shapes. For example, consider the following object:

  • int: "i1"
  • float: "f1"
  • bool: "b1"
  • int: "i2"

ints require four bytes, floats require eight bytes, and bools require one byte. Because each member must be addressable by a pointer value that must divide evenly by the size of the sub-type, space is wasted between some of the members:

        012345678901234567890123456789
index:  0       8       16  20 23
member: i1      f1      b1  i2 ]

The position of the member's name represents the start
of the member in memory. "]" represents the end of the
object.

Notice that b1 requires four bytes so that i2 can start on an index that is a multiple of four (20). i1 requires eight bytes instead of four so that f1 can start on a multiple of eight (8).

Sorting the object's members by size descending eliminates this problem:

        01234567890123456789
index:  0       8   12  16
member: f1      i1  i2  b1

The object takes up 17 bytes instead of 23.

In this simple case, the object does not actually save space if the minimum object size is greater than 23 bytes, but it can help reduce significant amounts of space for larger objects.

Tuples

Tuples must take great care to ensure that when members are compacted in this way, subscripts refer to the position of the sub-type within the tuple as it is declared and not as it is stored in memory. This can be done statically at compile time, but requires stringent testing to ensure that all lookups and declarations of the tuple properly retrieve values.

Strings and Arrays

Strings and arrays have eight additional bytes of overhead on top of normal object overhead. The first four bytes of usable space within the string or array represent an unsigned integer describing the assumed length of the object. The assumed length is the length that the string or array is intended to be. The second four bytes of usable space within the string or array represent another unsigned integer describing the actual length of the object. This is the maximum size of the array for the amount of space that was allocated.

For example, consider the following code:

var x = new array<int>(10);

An int consumes four bytes of memory. An array of ten ints would occupy 4 * 10 + 8 bytes (not counting normal object overhead for GC). The first four bytes would be the unsigned integer representation of 10. The second four bytes might be the unsigned integer representation of 14.

When the array is allocated, the memory allocator may choose to allocate more memory than is requested. This is normal, and can vary from a few bytes to the number of bytes that would increase the requested amount to the next power of two. The actual number of bytes represents the size of the string or array as if it consumed every single byte of the allocated space. Normally, 48 bytes would be required for the above array (40 bytes for the content, 8 bytes for the array overhead). If the allocator provides 64 bytes, we subtract off the overhead to yield 56 bytes of usable space. Each item in the array (int) uses four bytes of memory, so we divide that by four to yield 14.

This is useful for optimizations, especially around memory usage and modification of arrays. Granted, array size is immutable. However, some operations involving arrays with a single reference can be optimized to simply modify a single existing array to yield identical results.

Object Overhead and Shape

The overhead of the object (two 32-bit unsigned integers to store object type and reference count) may appear to be contrary to the above-mentioned concepts. Each number requires four bytes instead of eight, and this can occur before the start of the object. This is not a problem, however: the largest primitive is a float, clocking in at eight bytes. The sum of the size of two uints is eight bytes, meaning that we can guarantee that there is no wasted space.

On some compile targets or platforms, it may be the case that different values are used for object type and reference count. In cases where this causes issues, padding will be added (and space wasted) to ensure that sub-types always line up with the appropriate numeric values.

Clone this wiki locally