Skip to content

perf: avoid call_indirect for capturing closures with statically-known fi#5964

Draft
ggreif wants to merge 13 commits intogabor/forwardfrom
gabor/static-closure
Draft

perf: avoid call_indirect for capturing closures with statically-known fi#5964
ggreif wants to merge 13 commits intogabor/forwardfrom
gabor/static-closure

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented Mar 31, 2026

Summary

  • Introduces SR.StaticClosure of int32 in both compile_classical.ml and compile_enhanced.ml to carry the statically-known Wasm function index from FuncDec.closure to the CallPrim call site
  • A special case in compile_dec eagerly compiles LetD (VarP v, FuncE …) non-const expressions to extract fi, binding v as VarEnv.Local (SR.StaticClosure fi, …) so the call site sees the correct SR
  • CallPrim matches SR.StaticClosure fi, Type.Local before the generic _, Type.Local arm, emitting Call fi directly instead of loading funptr_field and doing call_indirect — the closure is still allocated and passed as $clos
  • test/run/fmodf-forward.mo: $bar.1 now emits call $quux.1 instead of call_indirect
  • Guards prevent the optimisation from firing unsafely: AllocHow.(LocalImmut _) (skips variables needing heap indirection for forward references) and VarEnv.all_in_scope (skips mutually-recursive groups where captures aren't yet in scope)

Resolves the optimization opportunity sketched in #5961.

What is NOT done

  • Subcase 3 (function passed as argument to higher-order callee) — requires devirtualization, out of scope

Test plan

  • test/run/fmodf-forward.mo passes all phases: [tc] [run] [comp] [valid] [FileCheck] [wasm-run]
  • tests.mo-idl clean (mutually-recursive guard)
  • tests.run-eop-release clean including quicksort (AllocHow guard)

🤖 Generated with Claude Code

@ggreif ggreif requested a review from a team as a code owner March 31, 2026 21:07
@ggreif ggreif self-assigned this Mar 31, 2026
@ggreif ggreif marked this pull request as draft March 31, 2026 21:13
@ggreif ggreif changed the title feat(enhanced): avoid call_indirect for capturing closures with statically-known fi perf: avoid call_indirect for capturing closures with statically-known fi Mar 31, 2026
Comment thread src/codegen/compile_enhanced.ml Outdated
Comment on lines +11605 to +11606
(match fi_opt with
| Some fi ->
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
(match fi_opt with
| Some fi ->
(match fun_sr with
| SR.StaticClosure fi ->

Comment thread src/codegen/compile_classical.ml Outdated
(fun _ae -> fun_code ^^ G.i (LocalSet (nr local_i))),
unmodified )
| _ ->
let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how (Source.({it = VarP v; at = e.at; note = typ})) in
Copy link
Copy Markdown
Contributor Author

@ggreif ggreif Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how (Source.({it = VarP v; at = e.at; note = typ})) in
let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how Source.{e with it = VarP v; note = typ} in

Comment thread src/codegen/compile_classical.ml Outdated
let fun_sr, fun_code = compile_exp env pre_ae e in
(match fun_sr with
| SR.StaticClosure fi ->
let (pre_ae1, local_i) = VarEnv.add_direct_local env pre_ae v (SR.StaticClosure fi) typ in
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (pre_ae1, local_i) = VarEnv.add_direct_local env pre_ae v (SR.StaticClosure fi) typ in
let pre_ae1, local_i = VarEnv.add_direct_local env pre_ae v fun_sr typ in

Comment thread src/codegen/compile_classical.ml Outdated
(fun _ae -> fun_code ^^ G.i (LocalSet (nr local_i))),
unmodified )
| _ ->
let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how pat in
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (pre_ae1, alloc_code, pre_code, sr, fill_code) = compile_unboxed_pat env pre_ae how pat in
let pre_ae1, alloc_code, pre_code, sr, fill_code = compile_unboxed_pat env pre_ae how pat in

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

Comparing from 4dfc891 to f6f4f1e:
In terms of gas, 4 tests improved and the mean change is -0.0%.
In terms of size, 5 tests improved and the mean change is -0.0%.

ggreif and others added 13 commits April 19, 2026 03:28
…cally-known fi

Introduces SR.StaticClosure (fi : int32) to compile_enhanced.ml.
FuncDec.closure now returns SR.StaticClosure fi instead of SR.Vanilla,
threading the statically-known function index to the call site.

A special case in compile_dec eagerly compiles LetD (VarP v, FuncE …)
non-const expressions with pre_ae, extracts fi from SR.StaticClosure fi,
and binds v as VarEnv.Local (SR.StaticClosure fi, local_i).  This is safe
for non-recursive LetD since the FuncE does not reference v and all its
captures are already in pre_ae.

In CallPrim's _, Type.Local branch, fi_opt = Some fi causes a direct
G.i (Call (nr fi)) instead of loading funptr_field and calling call_indirect.
The closure is still allocated and passed as $clos (captures are needed).

StackRep.adjust (SR.StaticClosure _) SR.Vanilla is a no-op (same i64 on stack).
SR.StaticClosure is wired into to_block_type, to_var_type, to_string,
drop, join, and adjust.

test/run/fmodf-forward.mo: $bar.1 now emits call $quux.1 instead of
call_indirect; CHECK updated accordingly.

Classical backend (compile_classical.ml) not yet ported.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the three subcases of the missed optimisation, the
SR.StaticClosure approach (Option B), and current status.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port the capturing-closure direct-call optimization (subcase 2) from
compile_enhanced.ml to compile_classical.ml:

- Add SR.StaticClosure of int32 variant (closure ptr on stack, fi known)
- Wire into to_var_type/to_block_type (I32Type), to_string, drop, join, adjust
- FuncDec.closure returns SR.StaticClosure fi instead of SR.Vanilla
- compile_dec special-cases non-const LetD(VarP v, FuncE) to eagerly
  compile the FuncE and bind v as Local(SR.StaticClosure fi, …)
- CallPrim's _, Type.Local branch matches fun_sr directly for
  SR.StaticClosure fi and emits Call fi + FakeMultiVal.load instead of
  call_indirect

Also refactors compile_enhanced.ml to match fun_sr directly (dropping
the intermediate fi_opt) as suggested in PR review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split the `_, Type.Local` arm into `SR.StaticClosure fi, Type.Local`
(direct call, fi bound from pattern) and `_, Type.Local` (call_indirect
fallback), eliminating the inner `match fun_sr` in both backends.

Also bind the LetD pattern as `pat` in compile_dec's FuncE special case
to pass it directly to compile_unboxed_pat.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per review: use `let a, b = ...` instead of `let (a, b) = ...`,
and pass fun_sr to add_direct_local instead of re-wrapping as
SR.StaticClosure fi.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In mutually-recursive local function groups AllocHow may promote a variable
to StoreHeap (MutBox) during its fixed-point iteration so that forward
references get a stable heap location.  The SR.StaticClosure fast path in
compile_dec bypasses compile_unboxed_pat, so it must not fire when AllocHow
has decided the variable needs heap indirection.

Add an AllocHow.M.find_opt guard (Some (AllocHow.LocalImmut _)) to the
special case in both compile_classical.ml and compile_enhanced.ml.
Without this the quicksort test aborts at runtime because the captured
closure ends up being an unset local (0) instead of a valid MutBox pointer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sistency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies that anonymous capturing lambdas also get direct calls through
both closure levels ($foo_lam → $@anon-func-N → $@anon-func-M directly).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n details

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ggreif ggreif force-pushed the gabor/static-closure branch from 5d3cb5f to 5fac707 Compare April 19, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Affects only gas usage or code size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant