perf: speed up RTS testing on macOS — split #rts build/check + parallelise test variants#6029
Open
perf: speed up RTS testing on macOS — split #rts build/check + parallelise test variants#6029
#rts build/check + parallelise test variants#6029Conversation
Use CARGO_TARGET_DIR per test variant (target-ni, target-inc, target-64) to avoid cargo lock contention, enabling `make -j3 test` to build and run all three RTS test variants in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6479860 to
4509ad2
Compare
Factor out the repeated test build/run pattern into a reusable test_variant macro. The cargo target dir is derived from the make target name (target-<name>). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4509ad2 to
61dee13
Compare
Contributor
- make -j3 → make -j: the number of test variants is the natural limit - test -f on the wasm binary before wasmtime: fail fast if cargo didn't produce the binary (wasmtime may return 0 on missing file) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add RTS_TEST_FILTER via wasmtime --invoke for per-module entry points - Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9) - Separate gc_predefined (hand-crafted heaps + components) from random seeds - 3 variants × 21 modules = 63 parallel wasmtime targets - Trace markers (>>> <<<) for build diagnostics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Separate gc_predefined (3 hand-crafted heaps) from gc_components (incremental/compacting/generational internal tests) - Split persistence into persistence_small (up to 10k objects) and persistence_20k (the heavy 20k serialization test) - Order TEST_MODULES heaviest-first so make -j starts long poles early - Make incremental GC sub-modules public for per-component entry points Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seed 20_000 caused a slice_index_fail in heap construction. Use the same seed as the other stabilization tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unlimited -j with 72 parallel wasmtime targets can exhaust memory on CI runners. Cap at 8 concurrent processes as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI failure was likely OOM from unbounded parallelism, not the seed. With -j8 cap, seed 20_000 should work. Remove >>> <<< debug traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
heap_size_for_gc ignored total_heap_size_bytes, always returning 3*PARTITION_SIZE (192 MB). For seeds that generate large object graphs (e.g. seed 20_000 with 20k objects), the dynamic heap exceeds this fixed size, causing slice_index_fail in create_dynamic_heap. Fix: use max(3*PARTITION_SIZE, 2*total_heap_size_bytes) so the heap grows to fit the actual content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vary the RNG seed across commits so different random heaps are tested over time. Seed is derived from git rev-parse HEAD at build time, with fallback to "4711" when not in a git repo (nix sandbox). Also enable WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gc::*, stable_option::test(), and stabilization sub-tests are safe functions — no unsafe block needed. Also run cargo fmt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… only The 20k test is too expensive to risk worst-case seeds. Keep it deterministic with a known-good seed. Small tests vary per commit to explore different heap shapes over time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
63e23f9 to
a6f03f7
Compare
Contributor
|
What are the added benefits? Is it really faster, how much? |
Contributor
Author
It is still dominated by the 20000-tree, but at least this one now runs in parallel to the others. I was fed up with the slowness on the Mac, so this might help. But I haven't done A/B testing yet. The other thing is that this introduces different rand seeds per 10000-tree. The fixed seed is kept for the big one for less surprises in run time. |
Contributor
Author
|
Keeping this as draft, as I am brainstorming how the bottleneck can be improved. EDIT: The bottleneck is resolved by disabling the |
…test) The host-side `cargo test` suite for the RTS dominates wall-clock on macos-latest CI: ~10+ min from-source build with no Hydra cache hit (aarch64-darwin combo not in cachix). On Linux it's a fast Hydra cache hit. So the cost is platform-asymmetric. Solution: factor `nix/rts.nix` into two derivations sharing all build attrs: - `rts-build` — `doCheck = false`, produces the cross-compiled wasm artifacts only. - `rts-checked` — `doCheck = true`, additionally runs `make -j8 test`. Wire into `flake.nix` such that: - `.#rts-checked` is always the test-running variant. - `.#rts` aliases `rts-checked` on Linux (keeps PR CI coverage as is — Hydra-cached, fast) and `rts-build` on darwin (skips the slow path). The `nightly-macos-test` workflow gains an explicit `rts-checked` job that builds `.#rts-checked` directly, so the darwin-side cargo test suite still runs daily on master, just not on every PR. Net effect: - Mac PR/artifacts CI: skips 10+ min of cargo test, fast cache hits. - Linux PR CI: unchanged. - Darwin RTS test coverage: moves from "every PR (slow)" to "nightly". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4c85ca3 to
41f380f
Compare
#rts build/check + parallelise test variants
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two complementary attacks on RTS test wall-clock — parallelisation makes the test suite finish faster when it does run, and the new build/check split lets
nix build .#rtsskip running it altogether on Mac (where it has no cache hit).Build-only
.#rtsand check-running.#rts-checked(latest)nix/rts.nixfactored into two derivations sharing all build attrs:rts-build—doCheck = false, produces the cross-compiled wasm artifacts only.rts-checked—doCheck = true, additionally runsmake -j8 test.Wired into
flake.nixso:.#rts-checkedis always the test-running variant..#rtsaliasesrts-checkedon Linux (Hydra-cached, fast — no behaviour change) andrts-buildon darwin (skips the slow path).nightly-macos-testgains an explicitrts-checkedjob that builds.#rts-checkeddirectly, so the darwin-sidecargo testsuite still runs daily on master.Why: on
macos-latest(aarch64-darwin) the host-sidecargo testis a from-source build with no Hydra cache hit (combo not in cachix), adding ~10+ min to every artifact build. On Linux it's a fast cache hit. The split is platform-asymmetric on purpose: PR-blocking on Linux, scheduled-only on darwin.Net effect:
nix build .#rtsterminates fast on the Mac everybody develops on; PR CI coverage on Linux is unchanged; darwin RTS test coverage moves from "every PR (slow)" to "nightly".Phase A: Variant-level parallelism
CARGO_TARGET_DIRper variant (target-<name>) to avoid cargo lock contention.define/evalMakefile template generates build + per-module run targets.make -j8 testin nixcheckPhase.Phase B: Per-module parallelism via
wasmtime --invoke#[no_mangle] pub extern "C" fn test_<mod>()entry point.wasmtime --invoke test_<mod>per module — works on wasm64-unknown-unknown without WASI.Phase C: GC seed chunking
test_gc_chunk_0..9).gc_predefined(hand-crafted heaps) fromgc_components(incremental/compacting internals).persistenceintopersistence_small(up to 10k objects) andpersistence_20k(20k objects).TEST_MODULESsomake -jstarts long poles early.Dynamic test seeds
git rev-parse HEADat build time.4711when not in a git repo (nix sandbox).persistence_20ktest uses fixed seed4711for predictable CI runtime.Bug fix: heap size scaling
heap_size_for_gcfor incremental GC ignoredtotal_heap_size_bytes, always returning3 * PARTITION_SIZE(192 MB).max(3 * PARTITION_SIZE, 2 * total_heap_size_bytes).20_000which generates a dense 20k-object graph.Other improvements
test -fguard beforewasmtimeto fail fast if cargo didn't produce the binary.WASMTIME_BACKTRACE_DETAILS=1for better crash diagnostics.unsafeblocks in test entry points.Observed speedup (parallelisation alone): from 2+ hours sequential to ~90 minutes parallel on macOS, limited by
persistence_20k(Amdahl's law). The.#rtsbuild-skip on top of that cuts the Mac path further whenever tests aren't needed (artifacts, dev shells, anything not gating on RTS unit tests).Test plan
nix build .#rtson darwin skipscargo test(timing check)nix build .#rts-checkedon darwin runs and passescargo testnix build .#rtson Linux still runscargo test(regression guard)nightly-macos-testrun shows the newrts-checkedjob — it does🤖 Generated with Claude Code