Skip to content

devansh-jain-2002/Compiler-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS335 Project 29

This repository is a three-stage compiler project for a Python-like language. Each milestone adds a new layer on top of the same lexer/parser foundation:

  • milestone1 builds the syntax tree and renders it as a graph.
  • milestone2 adds semantic analysis, symbol tables, and three-address code.
  • milestone3 adds backend code generation and emits x86 assembly.

Repository Flow

flowchart LR
    A[Source program] --> B[Flex lexer]
    B --> C[Bison parser]
    C --> D1[Milestone 1: AST]
    C --> D2[Milestone 2: symbol table + 3AC]
    C --> D3[Milestone 3: symbol table + 3AC + x86]
    D1 --> E1[AST.dot -> PDF]
    D2 --> E2[Global.csv + 3AC.txt]
    D3 --> E3[Global.csv + .3ac + .s]
    E3 --> F[gcc links executable]
Loading

The important thing to understand is that the lexer and parser do most of the work. The parser actions decide what each milestone produces:

  • In milestone 1, every grammar reduction creates AST nodes and parent-child edges.
  • In milestone 2, grammar actions populate symbol tables, check types, and build 3AC strings.
  • In milestone 3, the same semantic flow is extended with stack allocation, label-based control flow, function calls, array access, and x86 emission.

Flow In Detail

1. Lexing and indentation handling

The lexer recognizes Python-style tokens, including:

  • keywords such as def, class, if, else, while, for, return, print, range
  • literals such as integers, floats, strings, True, False, and None
  • operators, delimiters, and compound assignment forms
  • line structure tokens such as NEWLINE, INDENT, and DEDENT

Indentation is not treated as plain whitespace. The lexer keeps an indentation stack and converts leading spaces into INDENT / DEDENT tokens. It also suppresses newlines inside bracketed expressions so multiline calls, lists, and grouped expressions do not prematurely end a statement.

2. Parsing and syntax structure

All three milestones share the same parser shape, but each attaches different actions to the grammar.

  • Milestone 1 uses parser actions only for AST construction.
  • Milestone 2 attaches semantic checks and 3AC generation.
  • Milestone 3 adds backend lowering on top of the same grammar.

The grammar covers:

  • modules and statement lists
  • function definitions with typed arguments and optional return types
  • class definitions, methods, and self
  • assignments, augmented assignments, and annotated declarations
  • if / elif / else
  • while and for ... in range(...)
  • expressions, comparisons, boolean operators, arithmetic, power, lists, indexing, calls, and print

3. Semantic analysis and symbol tables

Milestone 2 and milestone 3 both create a global symbol table at startup and insert __name__ into it. As the parser reduces declarations, it records:

  • variable names and their types
  • function signatures and return types
  • class entries
  • class attributes and methods
  • stack offsets for later code generation

Type checks happen during parsing, not as a separate pass. That means the compiler can reject bad programs early, for example:

  • returning the wrong type from a function
  • assigning incompatible types
  • using range() with non-integer bounds
  • indexing non-lists
  • calling functions with the wrong number or type of arguments
  • using self outside of a class method

4. Intermediate code generation

Milestone 2 emits 3AC by concatenating strings during reductions. Control flow is lowered with labels and jumps, and expressions produce temporaries. The output is written as a line-numbered text form so the generated code is easy to inspect.

Milestone 3 keeps that 3AC flow and adds x86 generation in parallel. The parser actions now build:

  • temporaries for computed values
  • stack offsets for locals and arguments
  • labels for branching and loops
  • helper sequences for arithmetic, comparisons, calls, array access, and object construction

5. Final backend flow

Milestone 3 writes a .3ac file and a final x86 assembly file. The assembly includes a data section for string literals and a text section for generated functions and top-level code. The provided shell wrapper then compiles that assembly with gcc and runs the resulting binary.

Milestone Overview

milestone1

Milestone 1 is the front-end and AST milestone.

What it does:

  • lexes the source file
  • parses the program
  • builds an AST graph in AST.dot
  • renders the AST as a PDF with Graphviz

Useful files:

How to run it:

./milestone1/run.sh example.py AST.pdf

The script builds the parser, runs it, writes AST.dot, and converts the dot file into the PDF you requested.

milestone2

Milestone 2 adds semantic analysis and 3AC generation.

What it does:

  • creates and populates symbol tables
  • checks types while parsing
  • emits Global.csv for the global symbol table
  • emits 3AC.txt for the generated three-address code

Useful files:

How to run it:

cd milestone2/src
make
./parser_exec < ../../example.py

The executable reads from standard input, so the simplest workflow is shell redirection or piping.

milestone3

Milestone 3 is the full compiler backend.

What it does:

  • keeps the same parser and semantic checks
  • emits Global.csv
  • emits a .3ac file next to the input source
  • emits x86 assembly to the output path you pass on the command line
  • uses gcc to link the final executable in the provided wrapper script

Useful files:

How to run it:

./milestone3/src/run.sh example.py

Or run the compiler manually:

cd milestone3/src
make
./milestone3 --input=example.py --output=example.s
gcc example.s -o example.o -no-pie
./example.o

Testing milestone 3 from macOS with Docker

If you are on macOS and want a one-step Linux test path for milestone 3, use the Docker wrapper from the repo root:

bash scripts/test_milestone3_docker.sh milestone3/tests/test1.py

To run the whole milestone 3 test set:

bash scripts/test_milestone3_docker.sh --all

This builds a Linux amd64 image, compiles milestone 3 inside the container, emits <input>.3ac and <input>.s, links a Linux executable as <input>.linux.out, and runs it in the same container.

Project Layout

Generated Artifacts

  • Milestone 1: AST.dot and the rendered AST PDF
  • Milestone 2: Global.csv and 3AC.txt
  • Milestone 3: Global.csv, <input>.3ac, and the generated assembly file

Dependencies

You need the usual compiler-toolchain pieces available locally:

  • flex
  • bison
  • g++
  • dot from Graphviz
  • gcc for the milestone 3 linking step

Notes

  • The lexer and parser are the core of the project, so most behavior is encoded in grammar actions rather than separate passes.
  • The project grows incrementally across the milestones, so the same language features are often introduced in a simpler form first and then expanded later.
  • The repository already contains generated files and sample outputs under the milestone folders, which can be useful when checking behavior or comparing output shapes.

About

CS335 Python to x86_64 Compiler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors