This repository contains a complete Tiger compiler implemented in Standard ML/New Jersey, following the structure and techniques described in Andrew Appel's textbook, "Modern Compiler Implementation in ML".
The compiler translates programs written in the Tiger language, a simple imperative language with nested functions and heap-allocated records (described in Appendix A of the textbook), into MIPS assembly code that can be executed using the QtSpim simulator.
The compiler follows a standard multi-pass pipeline architecture, as outlined in Appel's textbook:
- Lexing (
lexer): Breaks the source text into tokens (Chapter 2). Implemented using ML-Lex. - Parsing (
parser): Analyzes the token stream to build an Abstract Syntax Tree (AST) representing the program's structure (Chapter 3). Implemented using ML-Yacc. The AST definition (absyn.sml) corresponds to Chapter 4. - Semantic Analysis (
semant): Performs type-checking and translates the AST into a high-level Intermediate Representation (IR). Constructs environments (symbol tables) to manage scopes (Chapter 5). Also performs Escape Analysis (escape.sml) to identify variables that must be stored in the frame (related to Chapter 6). - Translation (
translate): Converts the checked AST into a lower-level tree-based Intermediate Representation (TreeIR), handling function calls, variable access (including static links for nested functions), and control flow structures (Chapter 7). This involves managing Activation Records (frame.sml) based on the target architecture (MIPS) (Chapter 6). Collects procedure bodies and string literals intoFrame.fragfragments. - Instruction Selection (
codegen): Maps the canonicalTreeIR to abstract assembly instructions (AssemIR) using a maximal munch algorithm tailored for the MIPS architecture (Chapter 9). - Control Flow Analysis (
flowgraph): Builds a control flow graph from the abstract assembly instructions (Chapter 10). - Liveness Analysis (
liveness): Determines the live ranges of temporary variables using iterative dataflow analysis on the control flow graph (Chapter 10). Builds an interference graph. - Register Allocation (
regalloc,color): Assigns machine registers (or stack slots for spilled variables) to temporaries using a graph-coloring algorithm with support for coalescing and iterative spilling/rewriting (Chapter 11). - Code Emission (
layout): Generates the final MIPS assembly code, including procedure prologues/epilogues and formatting instructions (Chapter 12). Appends runtime support (StdLibrary).
This compiler implements several key techniques discussed in "Modern Compiler Implementation in ML":
- Abstract Syntax Tree (AST): Provides a clean separation between parsing and semantic analysis.
- Intermediate Representations: Uses both a high-level
TranslateIR (with Ex/Nx/Cx forms for handling expressions yielding values, no values, or conditional jumps) and a low-levelTreeIR before generating abstract assembly (Assem). - Nested Functions: Correctly handles lexical scoping and access to non-local variables via static links and appropriately structured activation records (frames).
- Escape Analysis: Determines which variables "escape" their defining function (e.g., accessed by a nested function) and must be allocated on the stack frame rather than in registers.
- Instruction Selection: Employs the "maximal munch" tree pattern matching algorithm for generating MIPS instructions.
- Register Allocation: Implements a sophisticated graph-coloring allocator with support for coalescing (to eliminate redundant MOVE instructions) and spilling (handling cases where temporaries exceed available registers). The allocator iteratively rewrites the instruction stream and rebuilds the interference graph if spilling occurs.
- MIPS Target: Generates runnable MIPS assembly code (
.sfiles). - Runtime Support: Includes a minimal runtime library (
StdLibrary/) for basic operations like printing, memory allocation (required for Tiger records/arrays), and interaction with the QtSpim simulator.
The project uses the SML/NJ Compilation Manager (CM).
CM.make "sources.cm";
InsnSelection.Main.compile "tests/test1.tig"; (* produces tests/test1.tig.s *)