Scheme → Rust AOT compiler built on a 1KB finite algebra. All type dispatch is a single table lookup — no tag-bit branches, no polymorphic inline caches — just a 32×32 Cayley table with 14 Lean-proved theorems. Compiles R7RS Scheme to standalone native binaries with competitive performance.
Named after Wispy the guinea pig.
Takes Scheme source and produces a standalone .rs file with inlined Cayley table, rib heap, and closure-converted code. No runtime dependencies — the output is a single Rust file you compile with rustc.
cargo run -- --compile bench/nqueens.scm > nqueens.rs
rustc -O -o nqueens nqueens.rs
./nqueens # 92
# With semi-space Cheney GC (bounded memory):
cargo run -- --gc cheney --compile bench/nqueens.scm > nqueens.rsType dispatch in the compiled output is branchless: every dispatch decision is a single index into the Cayley table. The algebra replaces the tag-bit branch chains found in traditional Scheme implementations.
| Repo | What | Install |
|---|---|---|
| wispy-table | 1KB Cayley table + Lean proofs + Z3 search | cargo add wispy-table |
| wispy-vm | Stak VM fork · REPL · self-hosted tools (specializer, Futamura tower, reflective tower, metacircular evaluator) | cargo install --path wispy |
| wispy-compile (this repo) | Scheme → Rust AOT compiler | cargo install --path . |
The self-hosted tool chain lives in wispy-vm under examples/wispy/:
| File | Role |
|---|---|
ir-lib.scm |
Shared 7-node algebraic IR (synced to examples/ir-lib.scm here) |
specialize.scm |
Partial evaluator for Cayley table IR — constant folding, dead branch elimination, beta reduction |
transpile.scm |
IR → Rust code generator (synced to examples/transpile.scm here) |
pe.scm |
General online partial evaluator for Scheme |
futamura.scm |
Futamura projections P1 and P2 on the algebra, with three-path verification |
futamura-real.scm |
P1 on a direct-style Scheme evaluator |
futamura-cps.scm |
P2 on the CPS metacircular evaluator |
reflective-tower.scm |
Reflective meta-interpreter tower |
metacircular.scm |
Metacircular Scheme interpreter |
examples/ir-lib.scm and examples/transpile.scm in this repo are synced copies of the wispy-vm originals, kept here so rsc.scm and the transpiler back-end can be developed and tested independently. The full Futamura pipeline (specialize.scm → transpile.scm → Rust) runs in wispy-vm. When rsc.scm matures to replace compile.rs, the self-hosted compiler will migrate there too, consolidating the complete self-hosting story in one place.
R4RS core: define, lambda (including rest args), if, cond, case, let/let*/letrec, named let, begin, and/or, quote, quasiquote, set!, do, define-syntax/syntax-rules (with ellipsis and dotted tail patterns).
R7RS extensions: case-lambda, define-record-type, values/call-with-values, guard/raise/error/with-exception-handler, define-library/import, call-with-current-continuation (escape-only).
Builtins: 70+ inlined operations — arithmetic, comparisons, list ops (map, for-each, apply, append, reverse), strings, characters, vectors, equal?, type predicates, I/O (display, write, read-char, read-line, ports/files), and the three algebra primitives (dot, tau, type-valid?).
First-class builtins: Operators like +, cons, car can be passed as values (e.g., (map + '(1 2) '(3 4))).
Vectors: Flat contiguous layout — header rib + N element ribs. vector-ref and vector-set! are O(1) direct indexing. Cheney GC bulk-copies vectors as a single unit.
Closures: Full closure conversion with lambda lifting. Self-tail-call → loop optimization.
The point of these benchmarks is not "we beat Chez" — Chez Scheme has 35 years of engineering. The point is that a table-driven algebra produces competitive native code without the usual type-dispatch machinery. No polymorphic inline caches, no type-feedback JIT, no tag-bit branch chains — just a 1KB lookup table.
| Benchmark | Wispy (no GC) | Wispy (Cheney) | Chez | Winner |
|---|---|---|---|---|
| fib | 1.96s | 1.97s | 3.28s | Wispy 1.7x |
| tak | 0.85s | 0.87s | 1.38s | Wispy 1.6x |
| sum | 0.29s | 0.31s | 2.36s | Wispy 8.1x |
| ack | 5.19s | 5.24s | 2.24s | Chez 2.3x |
| deriv | 2.01s | — | 0.91s | Chez 2.2x |
| diviter | 3.06s | 3.89s | 1.26s | Chez 2.4x |
| divrec | 6.12s | 7.69s | 1.40s | Chez 4.4x |
| nqueens | 7.63s | 11.64s | 3.79s | Chez 2.0x |
| destruc | 2.33s | 3.66s | 1.27s | Chez 1.8x |
| triangl | 1.17s | 3.10s | 1.85s | Wispy 1.6x |
| takl | 3.79s | 3.84s | 3.39s | Chez 1.1x |
| primes | 1.89s | 2.45s | 0.65s | Chez 2.9x |
Benchmarks from r7rs-benchmarks with standard parameters. All 12 benchmarks pass in no-GC mode; 11/12 pass in Cheney mode (deriv has a pre-existing GC bug). Wispy wins 4/12 (fixnum-heavy), Chez wins 8/12 (allocation/list-heavy).
Where Wispy is faster (fib, tak, sum, triangl): fixnum-heavy recursion where the Cayley table's branchless dispatch and fixnum propagation pay off — no type checks on the hot path, just raw i64 arithmetic and table lookups.
Where Chez is faster (deriv, nqueens, primes, etc.): allocation-heavy workloads where Chez's cross-function inlining and mature GC dominate. Wispy has expression-level type inference but not yet cross-function inlining — planned for the self-hosted compiler.
Cheney GC mode uses liveness-based root elision: functions whose bodies (transitively) never allocate emit zero GC overhead. On fib/tak/sum/ack/takl, Cheney matches no-GC exactly.
The compiler supports two heap modes selected via --gc:
cargo run -- --compile file.scm # grow-only heap (no GC, default)
cargo run -- --gc cheney --compile file.scm # semi-space Cheney copying GCNo GC (--gc none): grow-only Vec<Rib> heap. Zero overhead, zero pauses, but memory is never reclaimed. Best for short-running programs and benchmarks where allocation pressure is low.
Cheney GC (--gc cheney): semi-space copying collector with 512K rib capacity per space. Uses a shadow stack for precise root tracking — all live Val variables (function parameters, let bindings, loop variables) are stored in a GC-visible Vec<Val> and read via gc_load/gc_store. The collector:
- Copies live objects breadth-first (Cheney scan pointer)
- Uses forwarding pointers (tag 255) for shared structure
- Protects
alloc_ribarguments across GC via shadow stack push/pop - Cleans up per-function via RAII
GcGuard(Rust Drop trait) - Truncates shadow stack at loop tops to prevent unbounded growth
src/
├── lib.rs crate root (re-exports table, val, heap, symbol, reader, macros, compile)
├── bin/wispy.rs compiler CLI (--gc none|cheney flag)
├── table.rs re-export of wispy-table (32×32 Cayley table)
├── val.rs Val = tagged pointer (fixnum | rib index)
├── heap.rs rib heap: uniform (car, cdr, tag) for all types
├── symbol.rs symbol interning
├── reader.rs S-expression parser
├── macros.rs syntax-rules: pattern matching, ellipsis, dotted tails, template instantiation
└── compile.rs Scheme → Rust compiler (~5900 lines, includes both runtimes)
The 32×32 table (1KB) is a finite algebra shared across all wispy repos via wispy-table. 14 Lean-proved theorems (zero sorry):
- Absorbers: top (nil) and bot (#f) are left absorbers
- Retraction pair: Q and E are mutual inverses on the core (quote/eval)
- Classifier: tau partitions the core into two boolean classes
- Branch/Composition: rho dispatches on tau; cdr = rho . cons
- Y fixed point: Y(rho) = rho(Y(rho)), non-absorber
- Extensionality: all 32 rows are distinct
- Type dispatch: CAR x T_PAIR → valid, CAR x T_STR → error
12 core elements, 8 R4RS type tags, 3 special values, 5 R7RS type tags (record, values, error, bytevector, promise), 4 unused slots. The core is axiomatically equivalent to Kamea's Psi-16.
The table, value representation, reader, heap, and symbol interning all compile without std or alloc. The compiler (compile.rs) requires std.
cargo check --no-default-features --libcompile.rs (direct-style):
- Mutual tail recursion — only self-tail-calls are optimized to loops. Mutually recursive functions in tail position use regular calls and can overflow the stack.
- call/cc — escape-only (non-reentrant). Full continuations would require a CPS transform.
rsc.scm (CPS) resolves both limitations: all calls go through a trampoline (constant stack depth), and continuations are first-class closures.
examples/rsc.scm is a self-hosted Scheme→Rust compiler written in Scheme (~2200 lines). It uses a CPS + closure conversion pipeline based on Feeley's 90-minute Scheme-to-C compiler: all function calls become tail calls through a trampoline, closures are lambda-lifted with cons-list environments, and every continuation's free variables are the precise live set.
The compiler reaches a fixed point at generation 2: the self-compiled binary produces identical output when compiling itself again. Bootstraps through wispy-vm.
GC: rsc.scm emits a semi-space Cheney GC driven by a GC_BUDGET countdown — no shadow stack needed. Between trampoline steps only f + args are live; CPS makes the root set explicit. A generated gc_roots_visit visits all global vars and interned datums. A GC_DEPTH guard prevents GC from firing inside scheme_map/scheme_for_each where outer stack frames hold stale pointers.
Known-call optimization: When a tail call targets a compile-time-known global function, rsc.scm emits __lambda_N(Val::NIL, args...) directly, eliminating one alloc_rib (closure rib), one Action construction, and one dispatch_cps call per site. This gives 3.4× speedup on tak and 11% on fib.
Fixnum/bool propagation: Expression-level type inference within each function. Arithmetic let-bindings declare as i64 Rust locals; boolean let-bindings as bool; if-test conditions bypass bool_to_val/is_true. Self-tail-call loop parameters used only in arithmetic context are unboxed to i64 at loop entry and reassigned without Val wrapping. Measured: ~5% on fib, ~11% on diviter.
# Compile a program (wispy-vm as host)
echo '(define (fib n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2))))) (display (fib 30)) (newline)' \
| wispy examples/rsc.scm > fib.rs && rustc -O -o fib fib.rs && ./fib # 832040
# Self-compilation: rsc.scm compiles itself → gen1
wispy examples/rsc.scm < examples/rsc.scm > /tmp/rsc.rs
rustc -O -o /tmp/rsc /tmp/rsc.rs
# gen1 compiles rsc.scm → gen2 (fixed point: gen2 == gen3)
/tmp/rsc < examples/rsc.scm > /tmp/rsc2.rs
# diff /tmp/rsc.rs /tmp/rsc2.rs shows only gen1≠gen2 (host expansion diffs)
# but gen2 compiling rsc.scm produces identical output to gen2 itselfWhat it compiles: define (top-level and internal), lambda, if, cond, let/let*/letrec, named let, begin, and/or, set!, quote, quasiquote, closures, higher-order functions (map, for-each, apply), first-class primitives, strings, symbols, characters, equal?, read/eof-object?, display/newline/write-char, error.
Pipeline: S-expression → AST (tagged lists) → CPS conversion → lambda lifting / closure conversion → Rust emission (trampoline + dispatch_cps + __lambda_N functions).
Passes 9 of 12 r7rs-benchmarks (fib, sum, ack, tak, takl, nqueens, diviter, deriv, primes). Remaining work: contification (Kennedy's optimization — converts mutual tail recursion to direct jumps, eliminates make_cont per call), cross-function fixnum propagation, call/cc, multiple return values, macros, and wiring the specializer output into the P3 transpiler.
Two companion files support Futamura-projection-based specialization over the Cayley table algebra. They operate on a different level than rsc.scm: where rsc.scm compiles general Scheme, these tools work exclusively with algebraic expressions — programs that compute by composing Cayley table lookups.
Defines a 7-node tagged-pair representation for algebraic expressions:
| Node | Encoding | Meaning |
|---|---|---|
Atom(n) |
(0 . n) |
constant algebra element (0–31) |
Var(x) |
(1 . x) |
variable |
Dot(a, b) |
(2 . (a . b)) |
Cayley table lookup a·b |
If(t, c, a) |
(3 . ...) |
conditional on t != BOT |
Let(x, v, b) |
(4 . ...) |
local binding |
Lam(x, b) |
(5 . ...) |
abstraction |
App(f, a) |
(6 . ...) |
application |
Provides constructors (mk-atom, mk-dot, …), predicates, accessors, capture-avoiding substitution (subst-expr), and a pretty-printer (ir-display). Load this first — it is a dependency of transpile.scm and the specializer.
Walks an IR expression tree and emits valid Rust source targeting the 32×32 Cayley table. This is the back-end for Futamura projections P2 and P3: the specializer reduces a program to a residual IR tree (no Lam or App nodes), and transpile.scm renders it as flat u8 arithmetic over CAYLEY[][].
Emission rules:
Atom(Q)→ named constant (Q,CAR, …) orn_u8Dot(a, b)→CAYLEY[a as usize][b as usize]If(t, c, a)→if t != BOT { c } else { a }Let(x, v, b)→{ let v_x: u8 = v; b }
transpile-main emits a complete standalone Rust program: element constants, the full 32×32 Cayley table literal, and a fn main() that evaluates the residual expression.
# Load both files and run the built-in tests
cargo run -- -e '(load "examples/ir-lib.scm") (load "examples/transpile.scm")'
# Or via wispy-vm
wispy -e '(load "examples/ir-lib.scm") (load "examples/transpile.scm")'rsc.scm |
transpile.scm |
|
|---|---|---|
| Input | Scheme source (s-expressions) | Algebraic IR (from ir-lib.scm) |
| Domain | General Scheme | Cayley table expressions only |
| Pipeline | CPS + closure conversion + trampoline | Direct tree walk |
| Output | Rust with a Scheme runtime | Rust doing pure u8 table lookups |
| Role | The compiler proper | Back-end for the specializer's residual output |
Lam and App nodes are expected to be fully eliminated by the specializer before reaching transpile.scm. Bare lambdas in the input emit 0_u8 as a placeholder.
- Futamura P3.
specialize.scm(wispy-vm) already produces Lam/App-free residual IR from a known program;transpile.scmalready renders that IR as Rust. The remaining step is wiring the two: runtranspile-mainon the output ofspecialize, yielding a residual Rust program with zero interpreter overhead. - Type inference / specialization. Propagating type information through the call graph would eliminate runtime type checks on provably-fixnum paths.
- Cross-function inlining. Inlining small closures at call sites would eliminate the
dispatch_cpsindirect dispatch. - Full continuations. The CPS architecture already represents continuations as closures —
call/ccjust captures the current one. - Bare-metal RISC-V.
--target no-stdwith fixed-size heap arrays, no alloc crate, UART output.
WispyScheme descends from the Kamea project's algebraic framework. The independence theorems (93 Lean theorems, zero sorry) are in finite-magma-independence. The VM is a fork of Stak, itself derived from Ribbit, with the Cayley table integrated into the VM.
MIT