perf: speed up RTS testing on macOS — split `#rts` build/check + parallelise test variants by ggreif · Pull Request #6029 · caffeinelabs/motoko

ggreif · 2026-04-18T08:34:57Z

Summary

Two complementary attacks on RTS test wall-clock — parallelisation makes the test suite finish faster when it does run, and the new build/check split lets nix build .#rts skip running it altogether on Mac (where it has no cache hit).

Build-only `.#rts` and check-running `.#rts-checked` (latest)

nix/rts.nix factored into two derivations sharing all build attrs:

rts-build — doCheck = false, produces the cross-compiled wasm artifacts only.
rts-checked — doCheck = true, additionally runs make -j8 test.

Wired into flake.nix so:

.#rts-checked is always the test-running variant.
.#rts aliases rts-checked on Linux (Hydra-cached, fast — no behaviour change) and rts-build on darwin (skips the slow path).
nightly-macos-test gains an explicit rts-checked job that builds .#rts-checked directly, so the darwin-side cargo test suite still runs daily on master.

Why: on macos-latest (aarch64-darwin) the host-side cargo test is a from-source build with no Hydra cache hit (combo not in cachix), adding ~10+ min to every artifact build. On Linux it's a fast cache hit. The split is platform-asymmetric on purpose: PR-blocking on Linux, scheduled-only on darwin.

Net effect: nix build .#rts terminates fast on the Mac everybody develops on; PR CI coverage on Linux is unchanged; darwin RTS test coverage moves from "every PR (slow)" to "nightly".

Phase A: Variant-level parallelism

Separate CARGO_TARGET_DIR per variant (target-<name>) to avoid cargo lock contention.
define/eval Makefile template generates build + per-module run targets.
make -j8 test in nix checkPhase.

Phase B: Per-module parallelism via `wasmtime --invoke`

Each test module gets a #[no_mangle] pub extern "C" fn test_<mod>() entry point.
Makefile runs wasmtime --invoke test_<mod> per module — works on wasm64-unknown-unknown without WASI.
3 variants × 24 modules = 72 parallel targets.

Phase C: GC seed chunking

Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9).
Separate gc_predefined (hand-crafted heaps) from gc_components (incremental/compacting internals).
Split persistence into persistence_small (up to 10k objects) and persistence_20k (20k objects).
Heavy tests ordered first in TEST_MODULES so make -j starts long poles early.

Dynamic test seeds

Stabilization small tests use a seed derived from git rev-parse HEAD at build time.
Each commit exercises different random heap configurations.
Fallback to fixed seed 4711 when not in a git repo (nix sandbox).
The heavy persistence_20k test uses fixed seed 4711 for predictable CI runtime.

Bug fix: heap size scaling

heap_size_for_gc for incremental GC ignored total_heap_size_bytes, always returning 3 * PARTITION_SIZE (192 MB).
For seeds generating large object graphs, the dynamic heap exceeded this fixed size.
Fix: max(3 * PARTITION_SIZE, 2 * total_heap_size_bytes).
Discovered via seed 20_000 which generates a dense 20k-object graph.

Other improvements

test -f guard before wasmtime to fail fast if cargo didn't produce the binary.
WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics.
Removed unnecessary unsafe blocks in test entry points.

Observed speedup (parallelisation alone): from 2+ hours sequential to ~90 minutes parallel on macOS, limited by persistence_20k (Amdahl's law). The .#rts build-skip on top of that cuts the Mac path further whenever tests aren't needed (artifacts, dev shells, anything not gating on RTS unit tests).

Test plan

CI green
nix build .#rts on darwin skips cargo test (timing check)
nix build .#rts-checked on darwin runs and passes cargo test
nix build .#rts on Linux still runs cargo test (regression guard)
Next nightly nightly-macos-test run shows the new rts-checked job — it does

🤖 Generated with Claude Code

Use CARGO_TARGET_DIR per test variant (target-ni, target-inc, target-64) to avoid cargo lock contention, enabling `make -j3 test` to build and run all three RTS test variants in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Factor out the repeated test build/run pattern into a reusable test_variant macro. The cargo target dir is derived from the make target name (target-<name>). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-18T09:49:56Z

Comparing from 6a81fc5 to d8c9c87:
The produced WebAssembly code seems to be completely unchanged.
In terms of gas, no changes are observed in 5 tests.
In terms of size, no changes are observed in 5 tests.

- make -j3 → make -j: the number of test variants is the natural limit - test -f on the wasm binary before wasmtime: fail fast if cargo didn't produce the binary (wasmtime may return 0 on missing file) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add RTS_TEST_FILTER via wasmtime --invoke for per-module entry points - Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9) - Separate gc_predefined (hand-crafted heaps + components) from random seeds - 3 variants × 21 modules = 63 parallel wasmtime targets - Trace markers (>>> <<<) for build diagnostics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Separate gc_predefined (3 hand-crafted heaps) from gc_components (incremental/compacting/generational internal tests) - Split persistence into persistence_small (up to 10k objects) and persistence_20k (the heavy 20k serialization test) - Order TEST_MODULES heaviest-first so make -j starts long poles early - Make incremental GC sub-modules public for per-component entry points Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Seed 20_000 caused a slice_index_fail in heap construction. Use the same seed as the other stabilization tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Unlimited -j with 72 parallel wasmtime targets can exhaust memory on CI runners. Cap at 8 concurrent processes as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The CI failure was likely OOM from unbounded parallelism, not the seed. With -j8 cap, seed 20_000 should work. Remove >>> <<< debug traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

heap_size_for_gc ignored total_heap_size_bytes, always returning 3*PARTITION_SIZE (192 MB). For seeds that generate large object graphs (e.g. seed 20_000 with 20k objects), the dynamic heap exceeds this fixed size, causing slice_index_fail in create_dynamic_heap. Fix: use max(3*PARTITION_SIZE, 2*total_heap_size_bytes) so the heap grows to fit the actual content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Vary the RNG seed across commits so different random heaps are tested over time. Seed is derived from git rev-parse HEAD at build time, with fallback to "4711" when not in a git repo (nix sandbox). Also enable WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gc::*, stable_option::test(), and stabilization sub-tests are safe functions — no unsafe block needed. Also run cargo fmt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… only The 20k test is too expensive to risk worst-case seeds. Keep it deterministic with a known-good seed. Small tests vary per commit to explore different heap shapes over time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexandru-uta · 2026-04-20T07:02:22Z

What are the added benefits? Is it really faster, how much?

ggreif · 2026-04-20T07:51:57Z

What are the added benefits? Is it really faster, how much?

It is still dominated by the 20000-tree, but at least this one now runs in parallel to the others. I was fed up with the slowness on the Mac, so this might help. But I haven't done A/B testing yet.

The other thing is that this introduces different rand seeds per 10000-tree. The fixed seed is kept for the big one for less surprises in run time.

ggreif · 2026-04-20T08:12:39Z

Keeping this as draft, as I am brainstorming how the bottleneck can be improved.

EDIT: The bottleneck is resolved by disabling the checkPhase for MacOS and shifting it to a nightly CI task. A proper night-shift, so to say :-)

…test) The host-side `cargo test` suite for the RTS dominates wall-clock on macos-latest CI: ~10+ min from-source build with no Hydra cache hit (aarch64-darwin combo not in cachix). On Linux it's a fast Hydra cache hit. So the cost is platform-asymmetric. Solution: factor `nix/rts.nix` into two derivations sharing all build attrs: - `rts-build` — `doCheck = false`, produces the cross-compiled wasm artifacts only. - `rts-checked` — `doCheck = true`, additionally runs `make -j8 test`. Wire into `flake.nix` such that: - `.#rts-checked` is always the test-running variant. - `.#rts` aliases `rts-checked` on Linux (keeps PR CI coverage as is — Hydra-cached, fast) and `rts-build` on darwin (skips the slow path). The `nightly-macos-test` workflow gains an explicit `rts-checked` job that builds `.#rts-checked` directly, so the darwin-side cargo test suite still runs daily on master, just not on every PR. Net effect: - Mac PR/artifacts CI: skips 10+ min of cargo test, fast cache hits. - Linux PR CI: unchanged. - Darwin RTS test coverage: moves from "every PR (slow)" to "nightly". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ggreif requested a review from a team as a code owner April 18, 2026 08:34

ggreif self-assigned this Apr 18, 2026

ggreif added the testing Related to test suite label Apr 18, 2026

ggreif force-pushed the gabor/rts-parallel-tests branch from 6479860 to 4509ad2 Compare April 18, 2026 09:04

refactor: use define/eval template for RTS test variants

61dee13

Factor out the repeated test build/run pattern into a reusable test_variant macro. The cargo target dir is derived from the make target name (target-<name>). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ggreif force-pushed the gabor/rts-parallel-tests branch from 4509ad2 to 61dee13 Compare April 18, 2026 09:15

ggreif changed the title ~~perf: parallelise RTS test variants~~ chore: parallelise RTS test variants Apr 18, 2026

ggreif mentioned this pull request Apr 18, 2026

perf: RTS test64 GC tests allocate 192MB per seed (should be much smaller) #6030

Closed

ggreif and others added 10 commits April 18, 2026 12:41

fix: use known-good seed 4711 for persistence_20k

2a0e9b9

Seed 20_000 caused a slice_index_fail in heap construction. Use the same seed as the other stabilization tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: cap test parallelism at -j8 to avoid OOM on CI

007971a

Unlimited -j with 72 parallel wasmtime targets can exhaust memory on CI runners. Cap at 8 concurrent processes as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: revert seed to 20_000, remove trace markers

3d8bfe3

The CI failure was likely OOM from unbounded parallelism, not the seed. With -j8 cap, seed 20_000 should work. Remove >>> <<< debug traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove unnecessary unsafe blocks, cargo fmt

73c9bb1

gc::*, stable_option::test(), and stabilization sub-tests are safe functions — no unsafe block needed. Also run cargo fmt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ggreif force-pushed the gabor/rts-parallel-tests branch from 63e23f9 to a6f03f7 Compare April 18, 2026 16:20

ggreif marked this pull request as draft April 20, 2026 08:10

ggreif and others added 2 commits April 26, 2026 14:31

Merge branch 'master' into gabor/rts-parallel-tests

c498cdf

ggreif force-pushed the gabor/rts-parallel-tests branch from 4c85ca3 to 41f380f Compare April 26, 2026 12:39

ggreif changed the title ~~chore: parallelise RTS test variants~~ perf: speed up RTS testing on macOS — split #rts build/check + parallelise test variants Apr 26, 2026

ggreif marked this pull request as ready for review April 26, 2026 12:51

ggreif added 2 commits April 27, 2026 11:58

Merge branch 'master' into gabor/rts-parallel-tests

d0dc98f

Merge branch 'master' into gabor/rts-parallel-tests

d8c9c87

ggreif requested a review from alexandru-uta April 27, 2026 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up RTS testing on macOS — split `#rts` build/check + parallelise test variants#6029

perf: speed up RTS testing on macOS — split `#rts` build/check + parallelise test variants#6029
ggreif wants to merge 16 commits intomasterfrom
gabor/rts-parallel-tests

ggreif commented Apr 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

alexandru-uta commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggreif commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Build-only .#rts and check-running .#rts-checked (latest)

Phase A: Variant-level parallelism

Phase B: Per-module parallelism via wasmtime --invoke

Phase C: GC seed chunking

Dynamic test seeds

Bug fix: heap size scaling

Other improvements

Test plan

Uh oh!

github-actions Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandru-uta commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggreif commented Apr 18, 2026 •

edited

Loading

Build-only `.#rts` and check-running `.#rts-checked` (latest)

Phase B: Per-module parallelism via `wasmtime --invoke`

github-actions Bot commented Apr 18, 2026 •

edited

Loading

ggreif commented Apr 20, 2026 •

edited

Loading