Skip to content

Integer Word Conversions Explained

John Reppy edited this page Feb 13, 2025 · 5 revisions

This note describes an approach to representing and optimizing the conversions between integers (and words) of different sizes in Standard ML.

Background

Because different SML implementations can have different sizes of integer and word types, the designers of the SML Basis Library had to design a API for conversions that would work for any SML implementation. The approach is based on the idea that one can convert from IntN.int to IntM.int by first converting from IntN.int to LargeInt.int and then converting from LargeInt.int to IntM.int.[^1] While this approach works for source code, inside a given compiler we want to have direct conversions.

[^1]: For more details about this design, see The History of Standard ML.

Primitive conversion operations

All integer/word conversion operations are expressed using five primitive conversion operators. Algebraic equations over these operators are easy to define and can be used to simplify composition of conversion operations.

The five basic conversion operators are as follows (in all cases, we assume that $m \leq n$, but note that the order of the arguments varies):

  • COPY($m,n$) — copy (i.e., zero-extend) an $m$-bit value to an $n$-bit value. The operation COPY($n,n$) is the identity function on n-bit values.

  • EXTEND($m,n$) — sign extend an $m$-bit value to a $n$-bit value. Note that it may be helpful to also distinguish between extending signed and unsigned values, since the former may be in 2's complement form.

  • TRUNC($n,m$) — truncate an $n$-bit value to an $m$-bit value; i.e., discard the high-order $n-m$ bits.

  • TEST($n,m$) — map an $n$-bit, 2's complement signed value to an $m$-bit, 2's complement signed value; this operation raises Overflow if the input is outside the range $[-2^{m-1},2^{m-1}-1]$.

  • TESTU($n,m$) — map an unsigned $n$-bit value to an $m$-bit 2's complement value; this operation raise Overflow if the value is is outside the range $[0,2^{m-1}-1]$.

COPY and EXTEND are used to go from small values to large, while TRUNC, TEST, and TESTU are are used to go from large values to small. The operators COPY, EXTEND, and TRUNC are "pure," while TEST and TESTU may raise Overflow. We use ∞ for $m$ or $n$ to denote arbitrary precision integers (IntInf.int).

(Note: the SML/NJ implementation has a second set of primops -- TEST_INF, etc. -- for conversions involving IntInf.int)

Most conversions where the sizes are the same can be simplified to the identity:

  COPY(n,n)   => IDENTITY
  TEST(n,n)   => IDENTITY
  EXTEND(n,n) => IDENTITY
  TRUNC(n,n)  => IDENTITY

Note, however, that this property does not hold for the TESTU operator, which raises Overflow if the input is in the range $[2^{n-1},2^{n}-1]$.

Examples

Assuming that LargeInt is arbitrary precision (i.e., IntInf), LargeWord is 32 bits, and the default Int and Word types are 31 bits, then the translation of conversion operations in the Word32 structure is given by:

  toLargeInt    => TESTU(32,∞)
  toLargeIntX   => EXTEND(32,∞)
  fromLargeInt  => TESTU(∞,32)
  toInt         => TESTU(32,31)
  toIntX        => TEST(32,31)
  fromInt       => EXTEND(31,32)
  toLargeWord   => COPY(32,32)   => IDENTITY
  toLargeWordX  => EXTEND(32,32) => IDENTITY
  fromLargeWord => TRUNC(32,32)  => IDENTITY

And if LargeInt is Int32, then the operations in the Word8 structure are

  toLargeInt    => COPY(8,32)
  toLargeIntX   => EXTEND(8,32)
  fromLargeInt  => TRUNC(32,8)
  toInt         => COPY(8,31)
  toIntX        => EXTEND(8,31)
  fromInt       => TRUNC(31,8)
  toLargeWord   => COPY(8,32)
  toLargeWordX  => EXTEND(8,32)
  fromLargeWord => TRUNC(32,8)

Rewrites

These operations allow for simplification via algebraic rewrites.

Each operator composed with itself is itself, but with different parameters. In the following, we assume that $m \leq n \leq p$.

  COPY(n,p) o COPY(m,n)         == COPY(m,p)
  EXTEND(n,p) o EXTEND(m,n)     == EXTEND(m,p)
  TRUNC(n,m) o TRUNC(p,n)       == TRUNC(p,m)
  TEST(n,m) o TEST(p,n)         == TEST(p,m)
  TESTU(n,m) o TESTU(p,n)       == TESTU(p,m)

The composition of different operators is described by the following simple algebra:

  EXTEND(n,p) o COPY(m,n)       == COPY(m,p)   if (n > m)

  TRUNC(n,q) o COPY(m,n)        == COPY(m,q)   if (q >= m)
                                == TRUNC(m,q)  if (q < m)

  TEST(n,q) o COPY(m,n)         == COPY(m,q)   if (q >= m)
                                == TEST(m,q)   if (q < m)

  TESTU(n,q) o COPY(m,n)        == COPY(m,q)   if (q > m)
                                == TESTU(m,q)  if (q <= m)

  TRUNC(n,q) o EXTEND(m,n)      == EXTEND(m,q) if (q > m)
                                == TRUNC(m,q)  if (q <= m)

  TEST(n,q) o EXTEND(m,n)       == EXTEND(m,q) if (q >= m)
                                == TEST(m,q)   if (q < m)

  TESTU(n,q) o EXTEND(m,n)      == TESTU(m,q)  if (q <= m)

For example, consider:

  Word.toInt o Word.fromLargeWord o Word8.toLargeWord

Assuming a 31-bit default word type and 32-bit large words, this code translates to:

  TESTU(31,31) o TRUNC(32,31) o COPY(8,32)

and simplifies to:

  TESTU(31,31) o COPY(8,31)

This expression further simplifies to:

  COPY(8, 31)

Since both 8-bit and 31-bit quantities are tagged the same way, this can be translated to a MOVE. With a smart register allocator that MOVE can be eliminated.

Clone this wiki locally