This repository is a three-stage compiler project for a Python-like language. Each milestone adds a new layer on top of the same lexer/parser foundation:
milestone1builds the syntax tree and renders it as a graph.milestone2adds semantic analysis, symbol tables, and three-address code.milestone3adds backend code generation and emits x86 assembly.
flowchart LR
A[Source program] --> B[Flex lexer]
B --> C[Bison parser]
C --> D1[Milestone 1: AST]
C --> D2[Milestone 2: symbol table + 3AC]
C --> D3[Milestone 3: symbol table + 3AC + x86]
D1 --> E1[AST.dot -> PDF]
D2 --> E2[Global.csv + 3AC.txt]
D3 --> E3[Global.csv + .3ac + .s]
E3 --> F[gcc links executable]
The important thing to understand is that the lexer and parser do most of the work. The parser actions decide what each milestone produces:
- In milestone 1, every grammar reduction creates AST nodes and parent-child edges.
- In milestone 2, grammar actions populate symbol tables, check types, and build 3AC strings.
- In milestone 3, the same semantic flow is extended with stack allocation, label-based control flow, function calls, array access, and x86 emission.
The lexer recognizes Python-style tokens, including:
- keywords such as
def,class,if,else,while,for,return,print,range - literals such as integers, floats, strings,
True,False, andNone - operators, delimiters, and compound assignment forms
- line structure tokens such as
NEWLINE,INDENT, andDEDENT
Indentation is not treated as plain whitespace. The lexer keeps an indentation stack and converts leading spaces into INDENT / DEDENT tokens. It also suppresses newlines inside bracketed expressions so multiline calls, lists, and grouped expressions do not prematurely end a statement.
All three milestones share the same parser shape, but each attaches different actions to the grammar.
- Milestone 1 uses parser actions only for AST construction.
- Milestone 2 attaches semantic checks and 3AC generation.
- Milestone 3 adds backend lowering on top of the same grammar.
The grammar covers:
- modules and statement lists
- function definitions with typed arguments and optional return types
- class definitions, methods, and
self - assignments, augmented assignments, and annotated declarations
if/elif/elsewhileandfor ... in range(...)- expressions, comparisons, boolean operators, arithmetic, power, lists, indexing, calls, and
print
Milestone 2 and milestone 3 both create a global symbol table at startup and insert __name__ into it. As the parser reduces declarations, it records:
- variable names and their types
- function signatures and return types
- class entries
- class attributes and methods
- stack offsets for later code generation
Type checks happen during parsing, not as a separate pass. That means the compiler can reject bad programs early, for example:
- returning the wrong type from a function
- assigning incompatible types
- using
range()with non-integer bounds - indexing non-lists
- calling functions with the wrong number or type of arguments
- using
selfoutside of a class method
Milestone 2 emits 3AC by concatenating strings during reductions. Control flow is lowered with labels and jumps, and expressions produce temporaries. The output is written as a line-numbered text form so the generated code is easy to inspect.
Milestone 3 keeps that 3AC flow and adds x86 generation in parallel. The parser actions now build:
- temporaries for computed values
- stack offsets for locals and arguments
- labels for branching and loops
- helper sequences for arithmetic, comparisons, calls, array access, and object construction
Milestone 3 writes a .3ac file and a final x86 assembly file. The assembly includes a data section for string literals and a text section for generated functions and top-level code. The provided shell wrapper then compiles that assembly with gcc and runs the resulting binary.
Milestone 1 is the front-end and AST milestone.
What it does:
- lexes the source file
- parses the program
- builds an AST graph in
AST.dot - renders the AST as a PDF with Graphviz
Useful files:
How to run it:
./milestone1/run.sh example.py AST.pdfThe script builds the parser, runs it, writes AST.dot, and converts the dot file into the PDF you requested.
Milestone 2 adds semantic analysis and 3AC generation.
What it does:
- creates and populates symbol tables
- checks types while parsing
- emits
Global.csvfor the global symbol table - emits
3AC.txtfor the generated three-address code
Useful files:
How to run it:
cd milestone2/src
make
./parser_exec < ../../example.pyThe executable reads from standard input, so the simplest workflow is shell redirection or piping.
Milestone 3 is the full compiler backend.
What it does:
- keeps the same parser and semantic checks
- emits
Global.csv - emits a
.3acfile next to the input source - emits x86 assembly to the output path you pass on the command line
- uses
gccto link the final executable in the provided wrapper script
Useful files:
milestone3/src/lexer.lmilestone3/src/parser.ymilestone3/src/helper.hppmilestone3/src/Makefilemilestone3/src/run.sh
How to run it:
./milestone3/src/run.sh example.pyOr run the compiler manually:
cd milestone3/src
make
./milestone3 --input=example.py --output=example.s
gcc example.s -o example.o -no-pie
./example.oIf you are on macOS and want a one-step Linux test path for milestone 3, use the Docker wrapper from the repo root:
bash scripts/test_milestone3_docker.sh milestone3/tests/test1.pyTo run the whole milestone 3 test set:
bash scripts/test_milestone3_docker.sh --allThis builds a Linux amd64 image, compiles milestone 3 inside the container, emits <input>.3ac and <input>.s, links a Linux executable as <input>.linux.out, and runs it in the same container.
README.md: this walkthroughmilestone1/: AST-only frontendmilestone2/: symbol tables and 3ACmilestone3/: full compiler backendmilestone1/tests/: milestone 1 test inputsmilestone3/tests/: milestone 3 test inputsmilestone1/doc/andmilestone2/doc/andmilestone3/doc/: milestone PDFs and writeups
- Milestone 1:
AST.dotand the rendered AST PDF - Milestone 2:
Global.csvand3AC.txt - Milestone 3:
Global.csv,<input>.3ac, and the generated assembly file
You need the usual compiler-toolchain pieces available locally:
flexbisong++dotfrom Graphvizgccfor the milestone 3 linking step
- The lexer and parser are the core of the project, so most behavior is encoded in grammar actions rather than separate passes.
- The project grows incrementally across the milestones, so the same language features are often introduced in a simpler form first and then expanded later.
- The repository already contains generated files and sample outputs under the milestone folders, which can be useful when checking behavior or comparing output shapes.