-
Notifications
You must be signed in to change notification settings - Fork 0
Bootstrapping Executables
This page documents the underlying steps that are used to build an executable
program from SML sources. This method is used both to bootstrap the initial
interactive system as part of building the system from scratch, as well as
in the ml-build shell script (in the bin directory) that is used to create executable programs from
source code.
The cmb-make script (in the system directory) and the ml-build script
(part of the standard SML/NJ installation) compile SML code in "batch"
mode (as opposed the compiling the code in the REPL). The result of batch
compilation is a collection of
binfiles
and a boot-list file that specifies the order in which to link the binfiles.
Boot-list files are line-oriented text files. The first line of the file has the format
%num-specs max_line_len
which specifies the number of specifications and the maximum length of any line in the file (not counting the terminating newline). There are two forms of specifications: the line that specifies the runtime-system Persistent ID (RTPID) and the binfile specifications. The RTPID is specified as a line of the form
#pid
where pid is a 32-digit (16-byte) hexadecimal number.[1] The other lines in the boot-list file specify binfiles to be linked in their link order. These lines have the form
filename [
@offset [:desc ] ]
where filename is the name of the binfile, offset is an optional offset into the binfile (for stable libraries), and desc is the optional name of the object being loaded. The offset defaults to 0 and the desc defaults to filename. The desc field is used for messages, but does not affect the loading process. It typically includes both the filename and the offset, as well as the name of the source file being loaded.
The runtime system has a "boot loader" mode that will bootstrap an executable
from a boot-file list and a collection of pre-compiled binfiles.
The boot loader is invoked by specifying @SMLboot=bootlist as a command-line
option to the sml command.
The process of bootstrapping is an iteration over the specifications in the boot-list file.
For each entry, the loader reads the specified binfile, resolves its imports as
explained below, executes the code, and then binds the output PID to the result.
In the special case of the RTPID specification, the loader binds the RTPID
to the Assembly structure that implements the linkage to the runtime system.
As discussed elsewhere,
a binfile includes the PIDS of its imports (or dependencies) and the (optional)
PID of its output. Executing the code of the binfile corresponds to evaluating the
top-level declarations in the corresponding SML code. The result of this execution
is a heap object representing the resulting module.[2]
The bootstrap loader maintains a table that maps the output PIDS to these heap
objects; the RTPID is mapped to the representation of the Assembly structure.
Recall that a binfile specification has a filename and optional offset. The bootstrap loader processes each binfile specification as follows:
-
The specified binfile is opened for reading as a binary file.
-
If the binfile is an archive, the loader seeks to the file offset given in the specification.
-
The binfile header is read from the file; this information is used to determine the file offsets of the parts of the file that are read by the loader.
-
Construct the import record from imports (see below for details). The size of this record is specified by the binfile header, with an additional slot added to the end for the literals.
-
The code in the binfile is loaded. The entrypoint of the code is a function that is applied to the import record.
-
The output PID for the binfile is bound to the result of executing the code. In the case that the binfile does not have an output PID, the result of executing the code is ignored.
The binfile header specifies the number of imports. The imports are specified as a sequence of pairs that are a serialization of the following SML types:
datatype tree = NODE of (int * tree) list
type import = pid * treeNote that the number of pairs can be less than the number of imports, since
each tree can specify multiple imports. We construct the import record by
iterating over the import pairs. The following SML pseudocode describes
the process, where lookup pid maps pid to the heap object that it is
bound to, addImport(n, obj) adds obj to the import record at position n,
and select(obj, i) selects the i`th component of the record `obj.
fun buildImportRecord (nImports, pidTrees) = let
fun doPair (n, (pid, tree)) = let
val obj = lookup pid
fun doTree (n, NODE[], obj) = (addImport(n, obj); n+1)
| doTree (n, NODE kids, obj) =
List.foldl
(fn ((i, tr), n) => doTree(n, tr, select(obj, i)))
n
kids
in
doTree (n, tree, obj)
end
in
List.foldl doPair 0 pidTrees
endThe current handling of binfiles that are part of an archive is inefficient, since the archive file is opened and closed for each individual binfile. On the other hand, one has to be careful to limit the number of open files. Since we have a complete list of the files that we need to read from (i.e., the boot-list file), we can implement a cache of open files using the optimal page replacement algorithm.