perf: avoid call_indirect for capturing closures with statically-known fi#5964
Draft
ggreif wants to merge 13 commits intogabor/forwardfrom
Draft
perf: avoid call_indirect for capturing closures with statically-known fi#5964ggreif wants to merge 13 commits intogabor/forwardfrom
call_indirect for capturing closures with statically-known fi#5964ggreif wants to merge 13 commits intogabor/forwardfrom
Conversation
call_indirect for capturing closures with statically-known fi
ggreif
commented
Mar 31, 2026
Comment on lines
+11605
to
+11606
| (match fi_opt with | ||
| | Some fi -> |
Contributor
Author
There was a problem hiding this comment.
Why not
Suggested change
| (match fi_opt with | |
| | Some fi -> | |
| (match fun_sr with | |
| | SR.StaticClosure fi -> |
ggreif
commented
Mar 31, 2026
| (fun _ae -> fun_code ^^ G.i (LocalSet (nr local_i))), | ||
| unmodified ) | ||
| | _ -> | ||
| let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how (Source.({it = VarP v; at = e.at; note = typ})) in |
Contributor
Author
There was a problem hiding this comment.
Suggested change
| let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how (Source.({it = VarP v; at = e.at; note = typ})) in | |
| let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how Source.{e with it = VarP v; note = typ} in |
ggreif
commented
Mar 31, 2026
| let fun_sr, fun_code = compile_exp env pre_ae e in | ||
| (match fun_sr with | ||
| | SR.StaticClosure fi -> | ||
| let (pre_ae1, local_i) = VarEnv.add_direct_local env pre_ae v (SR.StaticClosure fi) typ in |
Contributor
Author
There was a problem hiding this comment.
Suggested change
| let (pre_ae1, local_i) = VarEnv.add_direct_local env pre_ae v (SR.StaticClosure fi) typ in | |
| let pre_ae1, local_i = VarEnv.add_direct_local env pre_ae v fun_sr typ in |
ggreif
commented
Mar 31, 2026
| (fun _ae -> fun_code ^^ G.i (LocalSet (nr local_i))), | ||
| unmodified ) | ||
| | _ -> | ||
| let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how pat in |
Contributor
Author
There was a problem hiding this comment.
Suggested change
| let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how pat in | |
| let pre_ae1, alloc_code, pre_code, sr, fill_code = compile_unboxed_pat env pre_ae how pat in |
Contributor
6d59b28 to
f6f4f1e
Compare
3 tasks
…cally-known fi Introduces SR.StaticClosure (fi : int32) to compile_enhanced.ml. FuncDec.closure now returns SR.StaticClosure fi instead of SR.Vanilla, threading the statically-known function index to the call site. A special case in compile_dec eagerly compiles LetD (VarP v, FuncE …) non-const expressions with pre_ae, extracts fi from SR.StaticClosure fi, and binds v as VarEnv.Local (SR.StaticClosure fi, local_i). This is safe for non-recursive LetD since the FuncE does not reference v and all its captures are already in pre_ae. In CallPrim's _, Type.Local branch, fi_opt = Some fi causes a direct G.i (Call (nr fi)) instead of loading funptr_field and calling call_indirect. The closure is still allocated and passed as $clos (captures are needed). StackRep.adjust (SR.StaticClosure _) SR.Vanilla is a no-op (same i64 on stack). SR.StaticClosure is wired into to_block_type, to_var_type, to_string, drop, join, and adjust. test/run/fmodf-forward.mo: $bar.1 now emits call $quux.1 instead of call_indirect; CHECK updated accordingly. Classical backend (compile_classical.ml) not yet ported. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the three subcases of the missed optimisation, the SR.StaticClosure approach (Option B), and current status. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port the capturing-closure direct-call optimization (subcase 2) from compile_enhanced.ml to compile_classical.ml: - Add SR.StaticClosure of int32 variant (closure ptr on stack, fi known) - Wire into to_var_type/to_block_type (I32Type), to_string, drop, join, adjust - FuncDec.closure returns SR.StaticClosure fi instead of SR.Vanilla - compile_dec special-cases non-const LetD(VarP v, FuncE) to eagerly compile the FuncE and bind v as Local(SR.StaticClosure fi, …) - CallPrim's _, Type.Local branch matches fun_sr directly for SR.StaticClosure fi and emits Call fi + FakeMultiVal.load instead of call_indirect Also refactors compile_enhanced.ml to match fun_sr directly (dropping the intermediate fi_opt) as suggested in PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split the `_, Type.Local` arm into `SR.StaticClosure fi, Type.Local` (direct call, fi bound from pattern) and `_, Type.Local` (call_indirect fallback), eliminating the inner `match fun_sr` in both backends. Also bind the LetD pattern as `pat` in compile_dec's FuncE special case to pass it directly to compile_unboxed_pat. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per review: use `let a, b = ...` instead of `let (a, b) = ...`, and pass fun_sr to add_direct_local instead of re-wrapping as SR.StaticClosure fi. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In mutually-recursive local function groups AllocHow may promote a variable to StoreHeap (MutBox) during its fixed-point iteration so that forward references get a stable heap location. The SR.StaticClosure fast path in compile_dec bypasses compile_unboxed_pat, so it must not fire when AllocHow has decided the variable needs heap indirection. Add an AllocHow.M.find_opt guard (Some (AllocHow.LocalImmut _)) to the special case in both compile_classical.ml and compile_enhanced.ml. Without this the quicksort test aborts at runtime because the captured closure ends up being an unset local (0) instead of a valid MutBox pointer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sistency Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies that anonymous capturing lambdas also get direct calls through both closure levels ($foo_lam → $@anon-func-N → $@anon-func-M directly). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5d3cb5f to
5fac707
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SR.StaticClosure of int32in bothcompile_classical.mlandcompile_enhanced.mlto carry the statically-known Wasm function index fromFuncDec.closureto theCallPrimcall sitecompile_deceagerly compilesLetD (VarP v, FuncE …)non-const expressions to extractfi, bindingvasVarEnv.Local (SR.StaticClosure fi, …)so the call site sees the correct SRCallPrimmatchesSR.StaticClosure fi, Type.Localbefore the generic_, Type.Localarm, emittingCall fidirectly instead of loadingfunptr_fieldand doingcall_indirect— the closure is still allocated and passed as $clostest/run/fmodf-forward.mo: $bar.1 now emitscall $quux.1instead ofcall_indirectAllocHow.(LocalImmut _)(skips variables needing heap indirection for forward references) andVarEnv.all_in_scope(skips mutually-recursive groups where captures aren't yet in scope)Resolves the optimization opportunity sketched in #5961.
What is NOT done
Test plan
test/run/fmodf-forward.mopasses all phases:[tc] [run] [comp] [valid] [FileCheck] [wasm-run]tests.mo-idlclean (mutually-recursive guard)tests.run-eop-releaseclean includingquicksort(AllocHow guard)🤖 Generated with Claude Code