- Default Values
- Primitive Types
- Constructed Types
- Special Types
- Optional Types
- Text Type
- Numeric Types
- Chrono Types
- Sequence Types
- Tuple Types
- Record Types
- Table Types
- Tensor Types
- Module Types
Every value in Rexl has a type. The type of a value determines the kinds of operations that can be performed
with the value. For example, the multiplication operator * can be applied to numeric values but not to text
values. When writing Rexl expressions it is important to understand the types of the values used in the
expression.
Rexl automatically infers the type of an expression from the functions, operators, and globals referenced by the expression. Generally, the author of an expression does not explicitly indicate the types involved.
Rexl has a rich type system, including primitive types and several kinds of compound or constructed types such as records, tuples, tensors, and sequences.
Rexl uses a structural type system, where compound types are defined purely by their structure and not
assigned a name. In contrast, C# uses a nominal type system, where compound types are given a name and
two types with distinct names can have identical structure. For example, in C# one can define two distinct
classes named Person and Wine, both containing only fields Name of type string and Age of type int. In Rexl,
both would be represented by the same record type, namely that with two fields, Name of type text and Age of
type I4.
Every Rexl type except vacuous has a default value.
The null value is the default value of any type that includes null. The zero value
is the default value of any numeric type. The default value of the bool type is false.
The default values of other types are documented in their respective sections.
Rexl supports several primitive types. These include:
- The text type, which contains sequences (of any length) of Unicode characters, including the empty
text value consisting of zero characters. An example of a text literal value in Rexl is
"Hello, World". - Numeric types, which contain various forms of numeric values. There are several of these, including
both integer and floating-point types with various precisions. Examples of numeric literal values in
Rexl include
12,3.5, and6.02e23. - The logical type, which is also known as the bool type, contains two values, namely
falseandtrue. This bool type is also considered a numeric type, since it is usable as a number, withfalserepresenting the numeric value0and true representing the numeric value1. - The date type, which represents both a date in an idealized Gregorian calendar, as well as the time within that day. The resolution of this type is to 100 nanoseconds.
- The time type, which represents a time interval consisting of a number of days, hours, minutes, seconds, and fractions of a second. A time value may be negative. The difference of two date values is a time value. Similarly, a time value may be added to a date value to produce a new date value. The resolution of this type is to 100 nanoseconds.
- The link types represent links to resources, such as documents, images, videos, audio clips, etc.
Rexl supports a rich set of compound or constructed types. These are types that are constructed from other types. These include:
- A sequence type is defined by an associated item type. It contains sequences of values from the
item type. A sequence can be any length, including the empty sequence consisting of zero items. Note
that the items in a sequence are ordered and are not necessarily unique. That is, a sequence is not just
a set. For example, the Rexl expression
[ 3.5, 7, 12.2, 7 ]evaluates to a sequence of numbers. - A tuple type has an associated arity (or number of slots) as well as slot types.
For example, the Rexl expression
(3, "Hi")evaluates to a tuple having two slots of typesI8(the numeric type representing signed integers in8bytes) and text, respectively. Note that the slots are ordered and need not be of the same type. - A record type has an associated set of fields, each having a name and type.
For example, the Rexl expression
{ A:3.5, B:true, C:"panda" }evaluates to a record having field namesA,B, andC, of typesR8(the numeric type representing floating point numbers in8bytes), bool, and text, respectively. Unlike slots in a tuple, fields in a record are not ordered. Consequently,{ C:"panda", A:3.5, B:true }evaluates to the exact same record value. - A table type is a sequence type whose item type is a record type. That is, a table is a sequence of
records (all of the same type). The columns of the table type are the fields of the record type. For
example, the Rexl expression
[ {A:3, B:"X"}, {A:7, B:"Y"} ]evaluates to a table with two columns namedAandB, with typesI8and text. - A tensor type is an advanced type used in scientific applications. Like sequence types, tensor types have an associated item type. They also have an associated rank, indicating the number of dimensions.
- A module type is an advanced type that represents a collection of named symbols. Documentation for modules will be added soon.
Rexl supports two special (esoteric) types:
- The general type: This is the universal type that contains all possible values produced by Rexl formulas.
- The vacuous type: This is the type that contains no values.
The general type is most commonly produced by combining values of very different types. For example, if B
has bool type, the expression:
If(B, 3, "Hello")
has the general type. Similarly, the expression:
Chain([ 3 ], [ "Hello" ])
has type sequence of general. The general type should be avoided in Rexl formulas. Since the set of
operations and functions that can be applied to a value depends on the value's type, there is little that can be
done with a value of type general. Depending on the host of Rexl, many expressions having the general type, such as
3 if B else "Hello", may generate a compilation error.
There is no way in Rexl to materialize a value of the vacuous type. However, the sequence construction
expression [] has type sequence of vacuous. Similarly, the local name x in the expression
ForEach(x:[], x) has vacuous type.
The default value of the general type is null. The vacuous type has no default value since it
contains no values.
Rexl supports the concept of a missing value via the special value null. Users of SQL, other database
languages, or object-oriented languages will be familiar with this concept. Note that the handling of null in
Rexl is often slightly different from SQL.
Some data types inherently include the null value, while others do not. The primitive types that include the
null value are the text type and the link types. Note that the null text value is distinct from the empty text
value consisting of no characters. The only constructed types listed above that contain the null value are the
sequence types. Unlike with text, a null sequence value is indistinguishable from an empty sequence value (of
the same item type).
The null value is not included in the other primitive and constructed types listed above. Any such type is
called a required type. However, every required type (that does not include null) has an associated optional
type that includes all the same values as the required type and also includes the null value. For example, I8 is
the (required) numeric type representing signed integers using 8 bytes. The optional I8 type contains all the
same values as well as the null value.
Note that unnecessary use of optional types incurs additional computational cost, so data sources (such as imported SQL tables and parquet files) should be constructed to use required types when possible.
The default value of all optional types is null.
The text type includes all ordered finite sequences of Unicode characters, including the empty sequence, as
well as the special value null. The null value is distinct from the empty text value.
Note: Even though the text type is defined using the term sequence, it is not a sequence type, since Rexl does not have a type corresponding to Unicode character.
A text literal value is placed in double quotation characters. Text literals can contain a double quotation
character by doubling the double quotation character. Text literals also support escaping (similar to C#) using
the \ character. The following are literal text values:
"Hello, world"
"I wrote \"Hello\" to C:\\folder\\file.txt"
"I wrote ""Hello"" to C:\\folder\\file.txt"
The corresponding text values are:
Hello, world
I wrote "Hello" to C:\folder\file.txt
I wrote "Hello" to C:\folder\file.txt
Rexl also supports a verbatim form of text literal where the \ character is not treated as an escape:
@"I wrote ""Hello"" to C:\folder\file.txt"
The escape character \ is used for many types of escaping, well beyond the two cases of \" and \\ in the
examples above.
Since text is an optional type, its default value is null.
Rexl currently includes twelve distinct numeric types. These vary in whether they are integer versus floating-point (capable of representing fractional values), whether they contain negative values, their size (number of bytes or bits used to represent them), and precision (the number of bits used for their mantissa). These types are summarized in the following table.
The kind column contains float for floating-point types, signed for the signed integer types (that can contain negative values), and unsigned for the unsigned integer types (that cannot contain negative values).
| Name | Kind | Size | Precision | Minimum Value | Maximum Value |
|---|---|---|---|---|---|
R8 |
float | 8 bytes | 53 bits | -1.79769313486232e308 | 1.79769313486232e308 |
R4 |
float | 4 bytes | 24 bits | -3.4028235e38 | 3.4028235e38 |
IA |
signed | variable | variable | no minimum | no maximum |
I8 |
signed | 8 bytes | 64 bits | -9_223_372_036_854_775_808 | 9_223_372_036_854_775_807 |
I4 |
signed | 4 bytes | 32 bits | -2_147_483_648 | 2_147_483_647 |
I2 |
signed | 2 bytes | 16 bits | -32_768 | 32_767 |
I1 |
signed | 1 bytes | 8 bits | -128 | 127 |
U8 |
unsigned | 8 bytes | 64 bits | 0 | 18_446_744_073_709_551_615 |
U4 |
unsigned | 4 bytes | 32 bits | 0 | 4_294_967_295 |
U2 |
unsigned | 2 bytes | 16 bits | 0 | 65_535 |
U1 |
unsigned | 1 bytes | 8 bits | 0 | 255 |
bool |
unsigned | 1 bit | 1 bit | 0 | 1 |
The IA type is known as the arbitrary precision integer type. It can represent an integer of any size (subject to
available memory). Note that arithmetic with IA is significantly more expensive than arithmetic with the other
numeric types, so it should be used only when needed. The remaining integer types are called fixed-sized
integer types.
The floating-point types, R8 and R4, use a certain number of bits for their mantissa, with the remaining bits
used to encode a base-two exponent and sign. Arithmetic using these types is inherently approximate since
any result needs to be rounded to the closest representable value. Note that the encoded exponent is a
base-two exponent, so these types can exactly represent only values that can be written as a fraction whose
denominator is a power of two (must be a dyadic rational). Fractions like 1/3 and 1/10 are not exactly
representable, but 1/2 and 3/8 are. The floating-point types contain three non-finite values known as positive
infinity ∞, negative infinity -∞, and NaN. The latter stands for not a number. These values may be generated
from the Rexl formulas 1/0, -1/0, and 0/0, respectively. The indicated minimum and maximum (finite) values
for these types are approximate, with the number following e indicating a base 10 exponent. These are the
smallest and largest finite values that the types can represent.
For the unsigned integer types, the minimum is zero and the maximum is
For the signed integer types with finite precision, the minimum is
The default value of all numeric types is the value 0 in that type. For the bool type, the default value
is the false value, which is the 0 value of that type.
In Rexl, when writing a numeric literal with no decimal point or exponent, the result will be either of type I8,
if the value fits within the range of I8, or of type IA otherwise. To indicate that the value is of any other
numeric type (except bool), append the name of the numeric type as a suffix. For example, 100 is of type I8,
but 100I2 is of type I2. Note that the type suffix can be either uppercase or lowercase as in 100i2.
The bool type does not support a type suffix. The values of bool type must be written false (for zero) and true
(for one).
When writing a numeric literal with either a decimal point or exponent (or both), the result will be of type R8.
To specify that the value should be of type R4, append a type suffix, as in 1.5r4.
Numeric literals may use the underscore character as a digit separator, as in 1_234_567. Integer values may be
written in hexadecimal (base sixteen) using the 0x prefix or in binary (base two) using the 0b prefix. For
example, the integer 100 can be written as 0x64 or 0b0110_0100.
There are various standard numeric conversions that allow using a value of one numeric type, the source type, where a different numeric type, the destination type, is needed. The Rexl compilation process automatically promotes (or converts) the value from the source type to the destination type.
The standard numeric conversions consist of:
- From any numeric type to the same type (the identity conversion).
- To
R8from any numeric type. This conversion can lose information when the source type isIA,I8, orU8, since those types all have larger precision (64bits or more) thanR8does (53bits). - To
R4from any numeric type other thanR8. This conversion can lose information when the source type isIA,I8,U8,I4, orU4, since those types all have larger precision (32bits or more) thanR4does (24bits). - To
IA, the arbitrary precision integer type, from any integer type. These conversions do not lose information. - To
I8from any integer type other thanIA. Note that this includes conversion fromU8toI8. With this conversion (U8toI8) there is the possibility of large positive values being reinterpreted as negative, so Rexl issues a warning when this conversion is used. For any source other thanU8, these conversions do not lose information. - To a fixed-sized signed integer type from a fixed-sized integer type (signed or unsigned) with smaller size. These conversions do not lose information.
- To a fixed-sized unsigned integer type from a fixed-sized unsigned integer type with smaller size. These conversions do not lose information.
The major numeric types are R8, IA, I8, and U8.
Generally, the numeric arithmetic operators (addition,
subtraction, multiplication, division, modulus, negation, exponentiation) always use one of these types.
These operators select one of the major numeric types, convert both operands to that type and perform the
operation within that type. The type that is selected depends on the operator and possibly on the types of
the operands. For example, floating-point division / always
uses R8. Exponentiation selects one of the fixed-sized major numeric
types (not IA). The integer division and modulus operators
select one of the integer major numeric types (not R8). In all cases, the selected type must have
standard numeric conversions from the operand types.
Generally, when multiple supported types have such conversions and are allowed by the operator, the selected type is determined by:
- If either operand type is floating-point, the selected type is
R8. - Otherwise, if either operand type is
IA, the selected type isIA. - Otherwise, if either operand type is
U8and the other is also an unsigned integer type, the selected type isU8. - Otherwise, the selected type is
I8.
For example:
- Adding or subtracting a
U2value and aU8value with the+or-operator selects theU8type. - Adding or subtracting a
U2value and aU4value with the+or-operator selects theI8type. Note that the selected type is signed. - Adding or subtracting a
U2value and anI1value with the+or-operator selects theI8type. - Adding or subtracting a
U2value and anR4value with the+or-operator selects theR8type. - Adding or subtracting a
U2value and anIAvalue with the+or-operator selects theIAtype. - Dividing or moding a
U8value by anIAvalue using thedivormodoperator selects theIAtype. - Dividing a
U8value by anIAvalue using the/operator selects theR8type.
When the selected type is R8 and the mathematical result requires more than 53 bits of precision, the result is
rounded to the closest value representable by R8.
When the selected type is I8 or U8 and the mathematical result is outside the range of that type, the result is
reduced modulo 1_000_000_000_000 * 1_000_000_000_000,
overflows I8 to produce 2_003_764_205_206_896_640, which is
Rexl supports a date type and a time type, known collectively as chrono types.
The resolution of the chrono types is 100 nanoseconds, or 0.1 microseconds, or 0.0000001 seconds. This unit of time is called a tick. There are 10 million ticks per second.
A date value represents a day in an idealized Gregorian calendar as well as a time value within that day. In our idealized Gregorian calendar:
- The minimum (earliest) date value is the beginning of year one.
- A leap year is one that is divisible by
4but not divisible by100unless it is also divisible by400. - Each leap year has
366days while each leap year has365days. - Each day has
24hours. - Each hour has
60minutes. - Each minute has
60seconds. - Each second has
10_000_000ticks. - The maximum (latest) possible date value is the final representable instant in year
9999, that is, one tick (0.0000001 second) before the beginning of year10_000.
Note that this system is not based in history and does not account for leap seconds or other adjustments typically made in a solar based time system.
The minimum date value (the beginning of year 1) is also the default value for the date type.
The total tick count of a date value is the number of ticks between the minimum date value (the beginning of
year 1) and the date value. The minimum date value has total tick count 0 and the maximum date value has total
tick count 3_155_378_975_999_999_999.
The time of day portion of a date value consists of a number of ticks, at least zero and less than
864_000_000_000, which is the number of ticks in 24 hours. The date type does not include any indication
of time zone. When needed, a time-zone offset should be tracked separately as a time value.
A time value represents a time interval consisting of a number of days, hours, minutes, seconds,
milliseconds (1_000 per second) and ticks (10_000 per millisecond). Time values can be positive, zero,
or negative. A time value corresponds to a number of ticks (positive, zero, or negative). This number of
ticks, called the total tick count of the time value, can be any number that fits in the
I8 integer type. That is, the smallest time value has total tick count equal to the
smallest value of the I8 type and the largest time value has total tick count equal to the largest value
of the I8 type.
A time value is often rendered as [s]D.H:M:S.F where [s] is an optional - or + sign, D is a
number of (24 hour) days, H is a number of hours, M is a number of minutes, S is a number of seconds,
and F is the fractional part. Note that . is used to separate the number of days from the time components
and also to separate the fractional part of a second from the whole seconds. A : is used to separate the hours
from minutes and minutes from seconds. With this notation, the largest time value is 10675199.02:48:05.4775807
and the smallest time value is -10675199.02:48:05.4775808. The default time value is zero, rendered 0.0:0:0.0.
Some of the arithmetic operators apply to chrono values. For example, two date values can be subtracted to get a time value. Similarly, a date value and time value can be added to get a new date value.
Date and time values can be constructed using the Date and
Time functions, respectively.
A chrono value can be converted to a text value using the ToText function.
As explained in Constructed Types, a sequence type is defined by its associated item type. Consequently, sequences are homogeneous, meaning that all items in a sequence are of the same type.
There are many ways to generate a sequence. One way is as a constructed sequence, by comma separated expressions between square braces. For example,
[ "Sally", "Bob", "Ahmad" ]
creates a sequence of text with three items.
Recall that the items of a sequence need to be of the same type. When the specified values are of different types, a common super type is used as the item type and each item is converted to that type. For example, the items in
[ true, 3, 7.5 ]
are of three distinct types, namely bool, I8 and R8. Each of these can be converted to R8, so R8 is used as the
item type of the sequence and the values true and 3 are converted to that type. Consequently, the result is
equivalent to
[ 1.0, 3.0, 7.5 ]
There are also many functions and operators that produce sequences. For example, the expressions
Range(5)
Range(1, 8, 2)
Repeat("Happy", 3)
[ 3, 5, 17 ] ++ Range(5)
produce sequences equivalent to
[ 0, 1, 2, 3, 4 ]
[ 1, 3, 5, 7 ]
[ "Happy", "Happy", "Happy" ]
[ 3, 5, 17, 0, 1, 2, 3, 4 ]
Since sequence types are optional (contain null), their default value is null.
As explained in constructed types, a tuple type has an arity, which is the number of slots in the tuple, together with a type for each slot. The arity may be any non-negative integer, including zero. To construct a tuple, place the comma-separated tuple slot values between parentheses.
The arity-zero tuple is written (). It contains no information, so it is not very useful.
An arity-one tuple contains the same information as its single slot value, so it is also not commonly used. To
write an arity-one tuple, the value must be followed by a trailing comma, as in (3,). The expression (3) is not
a tuple, but just a numeric value.
For higher arity tuples, the values may be followed by a trailing comma. For example, the expressions
(3, true, "hi")
(3, true, "hi",)
produce the same arity-three tuple value.
The default value of a tuple type is the tuple whose slot values are the default values of the
corresponding slot types. For example, a tuple type of arity 3 with slot types I8, text, and bool
has default value (0, null, false).
As explained in constructed types, a record type has an associated set of fields with each field having a name and type. To construct a record value, enclose field specifications between curly braces as in
{ First: "Sally", Last: "Ng", Age: 27, FullTime: true }
The order of the field specifications has no effect on the resulting value or type. Moreover, the order that fields are displayed by a host application is determined entirely by the host. A host may use the field order written in a record construction to determine a desired display order.
A field specification may consist of a simple name
(identifier)
followed by a colon : followed by the field value. An alternate form is the field value followed by as
and then the simple name.
There are two additional forms of field specification, where the name is implicit. One is where the field specification consists just of a simple name. In this case, the simple name is used as both the name and value for the field. The other form is where the field specification consists just of a dotted-expr as described in the dot operator section. In this case, the name for the field is the final simple name (identifier) of the dotted-expr and the entire dotted-expr is evaluated as the value of the field. For example
ForEach(Item:MyTable, { Item:Item, Name:Name, Age:Item.Age, Addr:Item.HomeAddr })
may be shortened to
ForEach(Item:MyTable, { Item, Name, Item.Age, Addr:Item.HomeAddr })
where the field names Item, Name, and Age are implicit. The name Addr must be specified explicitly
since it differs from HomeAddr.
This could also be written using the as form for the final field specification,
ForEach(Item:MyTable, { Item, Name, Item.Age, Item.HomeAddr as Addr })
The default value of a record type is the record whose field values are the default values of the corresponding field types.
As explained in constructed types, a table type is just a sequence type whose item type is a record type. The record type is said to be the row type of the table type and the fields of the record type are said to be the columns of the table type.
Since a table is a sequence of records, one can be constructed in Rexl as just that, a sequence of records. For example,
[{Name:"Sally", Age:27}, {Name:"Bob", Age:24}, {Name:"Ahmad", Age:32}]
produces a table with two columns, Name of type text and Age of type I8 (the default signed integer type), and
three rows, one for each record.
When one of the records does not contain all the fields of the others, that field is added with null value. For
example, in this expression
[{Name:"Sally", Age:27}, {Name:"Bob"}, {Name:"Ahmad", Age:32} ]
no age is specified for Bob. In the resulting table, his record has null for Age and the type of the Age column is
optional I8 rather than required I8.
As explained in constructed types, a tensor type is defined by its associated item type
and rank. Consequently, tensors are homogeneous, meaning that all items in a tensor are of the same type.
The rank is the number of dimensions of the tensor. The dimension values of a tensor define the shape
of the tensor. The shape is typically represented as a tuple with the number of slots matching the rank
of the tensor and each slot of type I8.
Rank-one tensors are called vectors and rank-two tensors are called matrices. For example, a point in
three-dimensional Euclidean space can be represented as a rank-one tensor with item type R8 and shape (3,).
Note that a point in five-dimensional Euclidean space would also be represented as a rank-one tensor with item type
R8, so these values would be part of the same tensor type. However, the shape of the latter would be (5,).
An RGB image can be represented as a rank three tensor with item type U1 (byte). The shape of such a tensor
value would be (H, W, C), where H is the height of the image, W is the width of the image, and C is the
number of color channels, typically 3 or 4, one for each of the component colors, red, green, and blue, and
possibly one for the transparency (alpha channel).
There are several ways to construct tensor values. For example,
Tensor.From([1, -1, 2, 0, 3, -2], 2, 3)
constructs a rank-two tensor of shape (2, 3). In mathematics, this would also be called a 2 x 3 matrix.
If the name x references this tensor value, then the individual items (also called elements or cells)
of the tensor can be accessed using the tensor indexing operator. Specifically,
the following indexing expressions result in the values indicated in the corresponding comments:
x[0, 0] // 1
x[0, 1] // -1
x[0, 2] // 2
x[1, 0] // 0
x[1, 1] // 3
x[1, 2] // -2
For a rank-two tensor like this, we'll often display values in a two dimensional layout such as
1 -1 2
0 3 -2
As explained in Extending to Tensor and Arithmetic Operators, many operators extend to tensors. In particular,
x + 5
results in another tensor with shape (2, 3) with values
6 4 7
5 8 3
Similarly, if x and y are two tensors with the same shape, x + y produces the item-wise
(or cell-wise) sum of those tensors. Similarly, x * y produces the item-wise product
(also known as the Hadamard product).
The default value of a tensor type is the tensor of the correct rank whose dimensions are all zero.
REVIEW: Need to document module types.