drcc

The name drcc means "doruche's C compiler". The compiler implements a small subset of C language. All the compiler, ranging from lexing and parsing in frontend, IR generation and optimization in middle-end, to RISC-V assembly emission in backend, is written in Rust from scratch, without any dependency on other libraries.

注意

出于一些原因这个项目近乎没办法继续开发了。一个主要问题是缺乏足够全面或者有效的测例，导致我很多时候几乎只能凭感觉编写代码，然后通过自制的一点相当贫瘠的测例来进行验证。总之目前这个编译器或许潜伏着不少神秘的bug也未可知。敬请各位多加小心啦~~~

Features

Tiny subset of C language
- A miniature of type system: only int and long types
- File-scope functions and variables: extern and static keywords are supported, allowing for modular programming and calling library functions
Middle-end optimizations:
- Constant folding
- Dead-code elimination
- Copy propagation
- Dead-store elimination
Backend code generation:
- Emits RV64IM assembly code
Clear Error Reporting:
- Errors are split into 3 categories: Lexical, Syntactic, and Semantic errors.
- Errors will be reported in bunches, not one at a time.
- Accurate location information, including line number and column number. e.g.
```
Parsing error: Ln 3, Col 4:11 Expected ';' after 'continue' statement.
```

Compiler Structure

drcc takes a quite clear structure, which can be loosely divided into following stages:

Lexing (lex): The input C source code is tokenized into a stream of tokens.
Parsing (ast): The token stream is parsed into an abstract syntax tree (AST). After parsing, we got an AstTopLevel structure, which contains all the top-level declarations and definitions in the source code.
Semantic Analysis (sem): Name resolution, label resolution, and type checking/annotating are done here. Any semantic errors will be reported as well. After this stage, we got an HirTopLevel structure, which restructures the AST into a high-level intermediate representation (HIR), containing all semantic information we need later, and stripping away all unnecessary details (e.g. In-block function declarations).
TAC Generation (tac): HIR, a tree-style IR, is translated into a classical three-address code (TAC) representation, i.e. TacTopLevel, which is considered as drcc's MIR (mid-level IR). From here on, we do not consider errors anymore, and the compiler is expected to be correct.
Some machine-independent optimizations can be applied to TAC code as well (tac/opt). These are all intra-procedural optimizations, which do not cross function boundaries.
LIR Generation (lir): We transform the TAC into a more machine-oriented low-level IR (LIR), i.e. LirTopLevel. This is the last IR before we emit the final assembly code. The LIR representations are quite closer to real RISC-V assembly code. For instance, the generalized instructions are transformed into concrete RISC-V instructions, like TacBinary -> {Add, Addw, Sub, ...}.
Basically, the LIR stage is composed of 4 parts:
- Incomplete LIR generation (lir/parse.rs)
- Register allocation (lir/regalloc) : We take a traditional graph coloring approach to allocate registers.
- Spilling (lir/spill.rs) : The unallocated virtual registers are spilled to stack slots here.
- Instruction Canonicalization (lir/canonic) : During previous stages, we didn't consider the restrictions of real RISC-V instructions, which is for the purity of each layers' logic. So we need to transform those invalid instrucions to valid ones here. e.g.
```
add x1, x2, 42 -> addi  x1, x2, 42

mv  -8(s0), 42 -> li    t5, 42      
                  sw    t5, -8(s0)
// we use t5/t6 as scratch registers for temporary values
```
The canoniclization are split into immediate-related and memory-replated fixings.
Assembly Emission (asm): LirTopLevel is translated into AsmTopLevel at this final stage, where all intermediate instructions (e.g. IntermediateInsn::Prologue) are transformed into a structured RISC-V assembly instructions. To emit these instructions is quite easy - just print them out.
Btw, here is an ideal place for peephole optimizations, which are not implemented yet.

Examples

See testprogs folder for some example C programs.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
cc-cli		cc-cli
cc		cc
testprogs		testprogs
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drcc

Features

Compiler Structure

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

drcc

Features

Compiler Structure

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages