From c8a4ed385f5ebdb89dfc08d755719c5acf64931f Mon Sep 17 00:00:00 2001 From: "korum-app[bot]" <264771079+korum-app[bot]@users.noreply.github.com> Date: Sun, 8 Mar 2026 19:31:31 +0000 Subject: [PATCH] Korum: Create a detailed architecture diagram of the code gen --- ARCHITECTURE.md | 75 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 ARCHITECTURE.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..948f442 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,75 @@ +# Tiny Compiler Architecture - Code Generation + +This document describes the architecture of the code generation pipeline in the Tiny compiler. + +## Overview + +The Tiny compiler follows a traditional multi-stage pipeline to transform source code into an executable binary. The code generation phase specifically handles the transformation from Intermediate Representation (IR) to ARM64 assembly and finally to a machine-code binary. + +## Code Generation Pipeline + +```mermaid +graph TD + subgraph Frontend + Source[Tiny Source Code] --> Reader[FileReader] + Reader --> Tok[Tokenizer] + Tok --> Par[Parser] + end + + subgraph "Middle-end (IR & Optimization)" + Par --> IRB[IrBuilder] + IRB --> IR[Intermediate Representation] + IR --> Alloc[Allocator] + Alloc --> RegAlloc[Register Allocation] + end + + subgraph "Backend (Code Generation)" + IR --> GenCtx[GenContext] + RegAlloc --> GenCtx + + subgraph "GenContext Internal Process" + GenCtx --> StackMap[Build Stack Map] + StackMap --> PhiProc[Process Phis] + PhiProc --> InstrGen[Insert Instructions] + InstrGen --> AsmMerge[Merge with Platform Base ASM] + end + + AsmMerge --> ASM[ARM64 Assembly] + end + + subgraph "Binary Compilation" + ASM --> Assemble[Assembler - as] + Assemble --> Obj[Object File] + Obj --> Link[Linker - ld/gcc] + Link --> Bin[Executable Binary] + end + + %% Data Structures + ProgramContext[(ProgramContext)] -.-> IR + AllocationGroup[(AllocationGroup)] -.-> RegAlloc +``` + +## Key Components + +### 1. GenContext (`tiny/src/codegen/arm64/gen.rs`) +The `GenContext` is the central component of the code generator. It maintains the state necessary to translate IR blocks into ARM64 instructions. + +- **Stack Management**: `build_stack_map` determines the stack layout for values that were spilled during register allocation. +- **Phi Resolution**: `process_phis` handles the transition of values between basic blocks by identifying "join-want-lists" for Phi nodes. +- **Instruction Generation**: `insert_instrs` recursively traverses the IR's basic blocks and emits corresponding ARM64 assembly. +- **Platform Support**: Supports both `Apple` and `Linux` ARM64 targets by using different base assembly templates (`apple_base.s` and `linux_base.s`). + +### 2. Register Allocation (`tiny/src/register_allocation/`) +Before code generation, the `Allocator` assigns virtual IR values to physical registers (X0-X30) or spills them to the stack. The resulting `AllocationGroup` is used by `GenContext` to emit the correct register operands. + +### 3. Binary Compilation (`tiny/src/codegen/bin_compile.rs`) +Once the assembly string is generated, this module handles the final steps: +- **Assemble**: Invokes the system assembler (`as`) to create object files. +- **Link**: Invokes the system linker (`ld` on macOS, `gcc` on Linux) to produce the final executable, handling entry point (`-e bb1`) and system library linking. + +## Data Flow + +1. **IR Input**: The `ProgramContext` contains a collection of functions and their basic blocks. +2. **Register Mapping**: For each instruction, `GenContext` looks up the assigned physical register from the `AllocationGroup`. +3. **Spill Handling**: If a value is marked as spilled, `GenContext` generates `STR` (store) and `LDR` (load) instructions to move data between registers and the stack. +4. **Control Flow**: IR branch and fall-through relationships are converted into assembly labels and branch instructions (`B`, `B.EQ`, etc.).