GitHub - choikwa/Rekt: Rekt Language Compiler

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
selfhost		selfhost
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README		README
helloworld.rek		helloworld.rek
Repository files navigation

Rekt
====

Rekt is an experimental compiled language and compiler written in C++.
The project currently has:

- a lexer
- a recursive-descent parser
- a separate semantic analysis stage
- a small AST optimizer
- an x86-64 text assembly backend
- a textual LLVM IR backend

The frontend accepts a language that is intentionally larger than the subset
the backends currently lower. This README is grammar-first: it documents the
syntax the parser accepts, then calls out which parts are currently compiled in
practice.


Status
------

Today the compiler can reliably handle a practical scalar-and-runtime subset:

- functions, globals, locals, returns
- `if`, `for`, `while`, `switch`
- `break`, `continue`
- scalar arithmetic and comparisons
- direct function calls
- strings as pointer-sized runtime values
- runtime-backed aggregates:
  - `[]`
  - `@[]`
  - `$[]`
  - `{}`
- aggregate construction, indexing, mutation, and selected iteration forms

The parser still accepts older or more ambitious syntax that is not fully
lowered yet. In particular, some historical iterator and switch forms are
accepted by the frontend but remain outside the compiled subset.


Build
-----

Build the compiler from [`src/Makefile`](/proj/src/Makefile):

```bash
cd src
make all
```

This produces `src/rekt`.


Test
----

Run the full parser, semantic, optimizer, x86, and LLVM suites:

```bash
cd test
bash run_tests.sh
```

There are also narrower harnesses:

```bash
bash run_semantic_tests.sh
bash run_codegen_tests.sh
bash run_llvm_tests.sh
```


CLI
---

The compiler currently recognizes these flags:

- `-E` preprocess only
- `-S` emit x86-64 assembly
- `-L` emit LLVM IR
- `-c` object mode
- `-o <path>` choose output path
- `-l` listing mode

Examples:

```bash
./src/rekt test/factorial.rek
./src/rekt test/prototype_forward.rek test/extern_decl.rek
./src/rekt -S -o /tmp/out.s test/factorial.rek
./src/rekt -L -o /tmp/out.ll test/factorial.rek
```

The CLI also accepts multiple `.rek` source files and compiles them as one
translation unit in argument order.

It also supports recursive source-level imports:

```rek
import "lib/math.rek";
import "lib/math.rek" { add, sub };
```

Imported files are resolved relative to the importing file, loaded once, and
flattened into the current translation unit.

Imported files only expose top-level functions and global declarations that are
explicitly marked `export`. Files named directly on the CLI remain mutually
visible. Selective imports only expose the named exported symbols from the
imported file.


BNF
---

The grammar below is intentionally close to the current recursive-descent
parser in [`src/Parser.cpp`](/proj/src/Parser.cpp), not an aspirational future
spec.

```bnf
<program> ::= { <stmt> }

<stmt> ::=
    <func>
  | <func-proto>
  | <extern-func>
  | <import-stmt>
  | <export-stmt>
  | <if-stmt>
  | <for-stmt>
  | <while-stmt>
  | <switch-stmt>
  | <decl> ";"
  | <decl> "=" <exp> ";"
  | <iden-or-type> "=" <exp> ";"
  | "return" <exp> ";"
  | "break" ";"
  | "continue" ";"
  | <exp> ";"

<func> ::= <type> <identifier> <params> <block>

<func-proto> ::= <type> <identifier> <params> ";"

<extern-func> ::= "extern" <type> <identifier> <params> ";"

<import-stmt> ::= "import" <string-literal> [ "{" <identifier> { "," <identifier> } "}" ] ";"

<export-stmt> ::= "export" ( <func> | <func-proto> | <extern-func> )

<params> ::= "(" [ <decl> { "," <decl> } ] ")"

<decl> ::= <type> <identifier>

<type> ::=
    <builtin-type>
  | "(" ")"
  | "[" "]"
  | "@" "[" "]"
  | "$" "[" "]"
  | "{" "}"

<builtin-type> ::= "int" | "bool" | "float" | "char" | "str"

<block> ::= "{" { <stmt> } "}"

<if-stmt> ::= "if" "(" <exp> ")" ( <block> | <stmt> ) [ <else-stmt> ]

<else-stmt> ::= "else" ( <if-stmt> | <block> | <stmt> )

<for-stmt> ::= "for" <iterator> <block>

<iterator> ::=
    "(" <decl-or-iden> "|" <iterator-tail> ")"
  | "{" <decl-or-iden> "|" <iterator-tail> "}"

<decl-or-iden> ::= <decl> | <identifier>

<iterator-tail> ::=
    <exp> { "," <exp> }
  | <iterable>

<iterable> ::= <identifier>

<while-stmt> ::= "while" "(" <exp> ")" <block>

<switch-stmt> ::= "switch" "(" ( <exp> | <decl> ) ")" "{"
                    <case-clause> { <case-clause> }
                  "}"

<case-clause> ::= <case-head> ":" { <stmt> }

<case-head> ::= <case-item> { "," <case-item> }

<case-item> ::= <exp> | "all" | "none"

<dict-entry> ::= <exp> ":" <exp>

<exp> ::=
    "(" <exp> ")"
  | "[" [ <exp> { "," <exp> } ] "]"
  | "@" "[" [ <exp> { "," <exp> } ] "]"
  | "$" "[" [ <exp> { "," <exp> } ] "]"
  | "{" [ <dict-entry> { "," <dict-entry> } ] "}"
  | "{" <brace-head> "|" <exp> { "," <exp> } "}"
  | <type> "(" <exp> ")"
  | <call>
  | <identifier>
  | <int-literal>
  | <float-literal>
  | <string-literal>
  | <unary-op> <exp>

<brace-head> ::=
    <call>
  | <identifier>
  | <type>
  | <int-literal>
  | <float-literal>
  | <string-literal>

<call> ::=
    <identifier> "(" [ <exp> { "," <exp> } ] ")"
  | <identifier> <exp>

<postfix-exp> ::=
    <primary-exp>
    { "." <exp> | "[" <exp> "]" | <binop> <exp> | "=" <exp> }
```

Notes on the grammar:

- Operator precedence is not encoded directly in the grammar. The parser first
  accepts recursive binary structure, then repairs precedence in
  `fixUpOpPrec(...)`.
- Bare single-argument calls like `print x` are intentionally accepted.
- `string-literal` covers both double-quoted strings and single-quoted
  character-like literals in the current lexer.
- Brace expressions support two distinct forms:
  - dict literals: `{ key: value, ... }`
  - iterator/comprehension-style forms: `{ head | expr, ... }`


Compiled Subset
---------------

The parser accepts more than the code generators currently lower. The current
compiled subset is:

- scalar types lowered as `i64`-like values:
  - `int`
  - `bool`
  - `char`
  - `str` as pointer-sized handle
- runtime-backed aggregate handles lowered as `i64`:
  - `[]`
  - `@[]`
  - `$[]`
- `{}`
- direct calls only
- source-level function prototypes and external declarations
- source-level imports with duplicate-load protection
- imported files expose only `export`-marked top-level functions and globals
- imports may optionally select a subset of exported names
- control flow:
  - `if`
  - `for {char x|str_expr} { ... }`
  - `for {int x|list_expr} { ... }`
  - `while`
  - scalar `switch` with literal case labels and `none`
  - `break`
  - `continue`
- aggregate operations:
  - list / tuple / set / dict literals
  - indexing
  - list and dict indexed mutation
  - selected runtime helpers such as length calls


Stage-1 Self-Hosting Roadmap
----------------------------

The shortest practical path to stage-1 self-hosting in this repo is:

1. Finish the declaration model.
   Add prototypes, then move toward cleaner multi-file and module-like source
   organization.
2. Keep the runtime boundary explicit.
   Prefer source-level declarations such as `extern` over backend-only ABI
   assumptions.
3. Improve compiler-friendly runtime data.
   Strengthen strings, globals/constants, and container-heavy helper code.
4. Add a minimal standard library in Rekt.
   Start with strings, dynamic arrays, diagnostics, and text building.
5. Add host/runtime I/O needed by compiler code.
   File reads, file writes, and basic build/runtime glue.
6. Port small support utilities first.
   String builders, token formatting, and helper modules.
7. Port the lexer.
8. Port semantic helpers and compiler support code.
9. Port the parser.
10. Rebuild the compiler with itself and compare stages.

Current checkpoint:

- runtime-backed control flow and aggregates are compiled
- source-level `extern` declarations are supported
- source-level function prototypes are supported
- source-level imports are supported
- source-level export visibility is supported for imported functions and globals
- selective imported symbol visibility is supported
- the next implemented milestone is namespaces or named-module access


Runtime Model
-------------

The project currently treats many runtime values as boxed or handle-like `i64`
values in codegen:

- integers and booleans are plain scalar values
- strings are pointer-sized runtime values
- aggregates are opaque handles
- aggregate elements are still effectively passed through the same scalar path

That is a deliberate simplification while the frontend and control-flow support
are being built out.


Builtins
--------

Semantic analysis currently registers a small builtin surface, including:

- `print(char) -> char`
- `strlen(str) -> int`
- `streq(str, str) -> bool`
- `read_file(str) -> str`
- `write_file(str, str) -> int`
- `argc() -> int`
- `argv(int) -> str`
- `str_concat(str, str) -> str`
- `str_from_char(char) -> str`
- `str_sub(str, int, int) -> str`
- `str_find(str, str) -> int`
- `str_from_int(int) -> str`
- `str_buf_new() -> str`
- `str_buf_push(str, str) -> str`
- `str_buf_push_char(str, char) -> str`
- `str_buf_take(str) -> str`
- `is_digit(char) -> bool`
- `is_alpha(char) -> bool`
- `is_space(char) -> bool`
- `list_new() -> []`
- `list_push([], value) -> []`
- `list_get([], int) -> value`
- `list_set([], int, value) -> []`
- `list_len([]) -> int`
- `tuple_len(@[]) -> int`
- `set_len($[]) -> int`
- `dict_new() -> {}`
- `dict_put({}, value, value) -> {}`
- `dict_get({}, value) -> value`
- `dict_len({}) -> int`

The backends also synthesize external runtime references for aggregate
construction, indexing, mutation, and iteration helpers when those features are
used.


Design Notes
------------

- The frontend is intentionally ahead of the backends.
- LLVM IR is the practical bootstrap path.
- The x86 backend is still useful as a direct, readable lowering target and as
  a debugging aid.
- Current documentation should be read as “what the parser accepts” plus
  “what the semantic/codegen pipeline currently implements,” not as a frozen
  long-term language spec.