choikwa/Rekt
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Rekt
====
Rekt is an experimental compiled language and compiler written in C++.
The project currently has:
- a lexer
- a recursive-descent parser
- a separate semantic analysis stage
- a small AST optimizer
- an x86-64 text assembly backend
- a textual LLVM IR backend
The frontend accepts a language that is intentionally larger than the subset
the backends currently lower. This README is grammar-first: it documents the
syntax the parser accepts, then calls out which parts are currently compiled in
practice.
Status
------
Today the compiler can reliably handle a practical scalar-and-runtime subset:
- functions, globals, locals, returns
- `if`, `for`, `while`, `switch`
- `break`, `continue`
- scalar arithmetic and comparisons
- direct function calls
- strings as pointer-sized runtime values
- runtime-backed aggregates:
- `[]`
- `@[]`
- `$[]`
- `{}`
- aggregate construction, indexing, mutation, and selected iteration forms
The parser still accepts older or more ambitious syntax that is not fully
lowered yet. In particular, some historical iterator and switch forms are
accepted by the frontend but remain outside the compiled subset.
Build
-----
Build the compiler from [`src/Makefile`](/proj/src/Makefile):
```bash
cd src
make all
```
This produces `src/rekt`.
Test
----
Run the full parser, semantic, optimizer, x86, and LLVM suites:
```bash
cd test
bash run_tests.sh
```
There are also narrower harnesses:
```bash
bash run_semantic_tests.sh
bash run_codegen_tests.sh
bash run_llvm_tests.sh
```
CLI
---
The compiler currently recognizes these flags:
- `-E` preprocess only
- `-S` emit x86-64 assembly
- `-L` emit LLVM IR
- `-c` object mode
- `-o <path>` choose output path
- `-l` listing mode
Examples:
```bash
./src/rekt test/factorial.rek
./src/rekt test/prototype_forward.rek test/extern_decl.rek
./src/rekt -S -o /tmp/out.s test/factorial.rek
./src/rekt -L -o /tmp/out.ll test/factorial.rek
```
The CLI also accepts multiple `.rek` source files and compiles them as one
translation unit in argument order.
It also supports recursive source-level imports:
```rek
import "lib/math.rek";
import "lib/math.rek" { add, sub };
```
Imported files are resolved relative to the importing file, loaded once, and
flattened into the current translation unit.
Imported files only expose top-level functions and global declarations that are
explicitly marked `export`. Files named directly on the CLI remain mutually
visible. Selective imports only expose the named exported symbols from the
imported file.
BNF
---
The grammar below is intentionally close to the current recursive-descent
parser in [`src/Parser.cpp`](/proj/src/Parser.cpp), not an aspirational future
spec.
```bnf
<program> ::= { <stmt> }
<stmt> ::=
<func>
| <func-proto>
| <extern-func>
| <import-stmt>
| <export-stmt>
| <if-stmt>
| <for-stmt>
| <while-stmt>
| <switch-stmt>
| <decl> ";"
| <decl> "=" <exp> ";"
| <iden-or-type> "=" <exp> ";"
| "return" <exp> ";"
| "break" ";"
| "continue" ";"
| <exp> ";"
<func> ::= <type> <identifier> <params> <block>
<func-proto> ::= <type> <identifier> <params> ";"
<extern-func> ::= "extern" <type> <identifier> <params> ";"
<import-stmt> ::= "import" <string-literal> [ "{" <identifier> { "," <identifier> } "}" ] ";"
<export-stmt> ::= "export" ( <func> | <func-proto> | <extern-func> )
<params> ::= "(" [ <decl> { "," <decl> } ] ")"
<decl> ::= <type> <identifier>
<type> ::=
<builtin-type>
| "(" ")"
| "[" "]"
| "@" "[" "]"
| "$" "[" "]"
| "{" "}"
<builtin-type> ::= "int" | "bool" | "float" | "char" | "str"
<block> ::= "{" { <stmt> } "}"
<if-stmt> ::= "if" "(" <exp> ")" ( <block> | <stmt> ) [ <else-stmt> ]
<else-stmt> ::= "else" ( <if-stmt> | <block> | <stmt> )
<for-stmt> ::= "for" <iterator> <block>
<iterator> ::=
"(" <decl-or-iden> "|" <iterator-tail> ")"
| "{" <decl-or-iden> "|" <iterator-tail> "}"
<decl-or-iden> ::= <decl> | <identifier>
<iterator-tail> ::=
<exp> { "," <exp> }
| <iterable>
<iterable> ::= <identifier>
<while-stmt> ::= "while" "(" <exp> ")" <block>
<switch-stmt> ::= "switch" "(" ( <exp> | <decl> ) ")" "{"
<case-clause> { <case-clause> }
"}"
<case-clause> ::= <case-head> ":" { <stmt> }
<case-head> ::= <case-item> { "," <case-item> }
<case-item> ::= <exp> | "all" | "none"
<dict-entry> ::= <exp> ":" <exp>
<exp> ::=
"(" <exp> ")"
| "[" [ <exp> { "," <exp> } ] "]"
| "@" "[" [ <exp> { "," <exp> } ] "]"
| "$" "[" [ <exp> { "," <exp> } ] "]"
| "{" [ <dict-entry> { "," <dict-entry> } ] "}"
| "{" <brace-head> "|" <exp> { "," <exp> } "}"
| <type> "(" <exp> ")"
| <call>
| <identifier>
| <int-literal>
| <float-literal>
| <string-literal>
| <unary-op> <exp>
<brace-head> ::=
<call>
| <identifier>
| <type>
| <int-literal>
| <float-literal>
| <string-literal>
<call> ::=
<identifier> "(" [ <exp> { "," <exp> } ] ")"
| <identifier> <exp>
<postfix-exp> ::=
<primary-exp>
{ "." <exp> | "[" <exp> "]" | <binop> <exp> | "=" <exp> }
```
Notes on the grammar:
- Operator precedence is not encoded directly in the grammar. The parser first
accepts recursive binary structure, then repairs precedence in
`fixUpOpPrec(...)`.
- Bare single-argument calls like `print x` are intentionally accepted.
- `string-literal` covers both double-quoted strings and single-quoted
character-like literals in the current lexer.
- Brace expressions support two distinct forms:
- dict literals: `{ key: value, ... }`
- iterator/comprehension-style forms: `{ head | expr, ... }`
Compiled Subset
---------------
The parser accepts more than the code generators currently lower. The current
compiled subset is:
- scalar types lowered as `i64`-like values:
- `int`
- `bool`
- `char`
- `str` as pointer-sized handle
- runtime-backed aggregate handles lowered as `i64`:
- `[]`
- `@[]`
- `$[]`
- `{}`
- direct calls only
- source-level function prototypes and external declarations
- source-level imports with duplicate-load protection
- imported files expose only `export`-marked top-level functions and globals
- imports may optionally select a subset of exported names
- control flow:
- `if`
- `for {char x|str_expr} { ... }`
- `for {int x|list_expr} { ... }`
- `while`
- scalar `switch` with literal case labels and `none`
- `break`
- `continue`
- aggregate operations:
- list / tuple / set / dict literals
- indexing
- list and dict indexed mutation
- selected runtime helpers such as length calls
Stage-1 Self-Hosting Roadmap
----------------------------
The shortest practical path to stage-1 self-hosting in this repo is:
1. Finish the declaration model.
Add prototypes, then move toward cleaner multi-file and module-like source
organization.
2. Keep the runtime boundary explicit.
Prefer source-level declarations such as `extern` over backend-only ABI
assumptions.
3. Improve compiler-friendly runtime data.
Strengthen strings, globals/constants, and container-heavy helper code.
4. Add a minimal standard library in Rekt.
Start with strings, dynamic arrays, diagnostics, and text building.
5. Add host/runtime I/O needed by compiler code.
File reads, file writes, and basic build/runtime glue.
6. Port small support utilities first.
String builders, token formatting, and helper modules.
7. Port the lexer.
8. Port semantic helpers and compiler support code.
9. Port the parser.
10. Rebuild the compiler with itself and compare stages.
Current checkpoint:
- runtime-backed control flow and aggregates are compiled
- source-level `extern` declarations are supported
- source-level function prototypes are supported
- source-level imports are supported
- source-level export visibility is supported for imported functions and globals
- selective imported symbol visibility is supported
- the next implemented milestone is namespaces or named-module access
Runtime Model
-------------
The project currently treats many runtime values as boxed or handle-like `i64`
values in codegen:
- integers and booleans are plain scalar values
- strings are pointer-sized runtime values
- aggregates are opaque handles
- aggregate elements are still effectively passed through the same scalar path
That is a deliberate simplification while the frontend and control-flow support
are being built out.
Builtins
--------
Semantic analysis currently registers a small builtin surface, including:
- `print(char) -> char`
- `strlen(str) -> int`
- `streq(str, str) -> bool`
- `read_file(str) -> str`
- `write_file(str, str) -> int`
- `argc() -> int`
- `argv(int) -> str`
- `str_concat(str, str) -> str`
- `str_from_char(char) -> str`
- `str_sub(str, int, int) -> str`
- `str_find(str, str) -> int`
- `str_from_int(int) -> str`
- `str_buf_new() -> str`
- `str_buf_push(str, str) -> str`
- `str_buf_push_char(str, char) -> str`
- `str_buf_take(str) -> str`
- `is_digit(char) -> bool`
- `is_alpha(char) -> bool`
- `is_space(char) -> bool`
- `list_new() -> []`
- `list_push([], value) -> []`
- `list_get([], int) -> value`
- `list_set([], int, value) -> []`
- `list_len([]) -> int`
- `tuple_len(@[]) -> int`
- `set_len($[]) -> int`
- `dict_new() -> {}`
- `dict_put({}, value, value) -> {}`
- `dict_get({}, value) -> value`
- `dict_len({}) -> int`
The backends also synthesize external runtime references for aggregate
construction, indexing, mutation, and iteration helpers when those features are
used.
Design Notes
------------
- The frontend is intentionally ahead of the backends.
- LLVM IR is the practical bootstrap path.
- The x86 backend is still useful as a direct, readable lowering target and as
a debugging aid.
- Current documentation should be read as “what the parser accepts” plus
“what the semantic/codegen pipeline currently implements,” not as a frozen
long-term language spec.