bench(wam-haskell): IntSet 3-way macro — algorithmic win didn't materialise at depth=10` by s243a · Pull Request #1698 · s243a/UnifyWeaver

s243a · 2026-04-29T03:06:20Z

Summary

Closes the IntSet visited arc with measured numbers — and the design's predicted speedup did not materialise at the workload's actual recursion depth. This is a real and worth-reporting finding.

Extends tests/benchmarks/wam_effective_distance_macro_bench.pl to a 3-way comparison (unlowered / Phase G lowered / Phase H intset).
Adds WAM_EFF_DIST_BENCH_SCALE env var so the bench can run against 1k or 10k facts.
Updates WAM_PERF_OPTIMIZATION_LOG.md Phase H final entry with the measured numbers and an honest reading of why the algorithmic improvement didn't pay off.
Memory files (outside repo) refreshed with the IntSet arc completion notes including this finding.

Measured numbers

10k scale (462 tuples, max_depth=10, 6 trials with rotating order):

variant	mean query_ms
unlowered (no directives)	931.5
lowered (Phase G `mode` only)	861.0
intset (Phase G + Phase H)	957.5

comparison	speedup
lowered vs unlowered	1.082× (Phase G constant-factor win, as expected)
intset vs lowered	0.899× (IntSet ~10 % SLOWER than list)
intset vs unlowered	0.973× (IntSet barely matches the slow path)

tuple_count=462 matches across all three — correctness preserved across both the Phase G lowering and the Phase H IntSet path.

Why the algorithmic O(log N) didn't pay off

The design predicted ~1.5–3× speedup from O(N) → O(log N) at max_depth=10. Reality: the Patricia-trie constant factors exceed the linear walk cost on a ~10-element list. Specifically:

Allocation per insert: IS.insert allocates new tree nodes per cons-extension (purely functional). [X|V] allocates one cons cell. For shallow visited sets, the allocation cost dominates.
Cache locality: a small contiguous cons-cell list packs into one or two cache lines. A 4-element IntSet trie scatters across multiple nodes.
Node traversal overhead: even IS.member on a 10-element IntSet does ~3-4 tag-checks and comparisons in a Patricia trie, vs at most 10 simple == checks in the list. Constant factors are surprisingly close at this size.

The IntSet wins likely materialize at deeper visited sets (max_depth ≥ 50); the canonical effective-distance workload doesn't reach there. Phase G's not_member_list lowering (skipping the put_structure + builtin_call dispatch + heap term allocation) is the actual macro win on this workload.

Honest takeaways

Algorithmic wins aren't free at small N. The design's "1.5-3× expected" was asymptotic reasoning that didn't account for IntSet's constant factors on deeply-allocating workloads.
The infrastructure is reusable. VSet, the directive, and the codegen paths can host other set representations (sorted array, small bitmap) that might beat IntSet at small N. Filed as future exploration.
Phase G is the real macro win. The constant-factor dispatch reduction is what speeds up the workload. Phase H's algorithmic pivot was the wrong move for max_depth=10. The implementation remains correct and opt-in: users at typical depth should leave the directive off.

Changes

`tests/benchmarks/wam_effective_distance_macro_bench.pl` (~90 LOC delta)

New intset variant in generate_project/2 that asserts both the mode declarations AND :- visited_set(category_ancestor/4, 4).
Reworked main/0 to run all three variants twice with rotating order, report 3-way comparison with three speedup ratios.
New WAM_EFF_DIST_BENCH_SCALE env var routes facts_path/1 to data/benchmark/<scale>/facts.pl. Defaults to 1k for fast smoke runs; set WAM_EFF_DIST_BENCH_SCALE=10k for the macro measurement.

`docs/design/WAM_PERF_OPTIMIZATION_LOG.md`

Phase H final entry's "Macro benchmark — TODO" replaced with the measured 10k results, the per-trial numbers, the constant-factor analysis, and the three honest takeaways.

Memory updates (outside repo)

project_wam_haskell_intset_visited.md — updated "what's not measured yet" → measured, with the honest finding.
project_wam_haskell_mode_analysis.md — extended with cross-arc summary and per-arc benchmark numbers.
MEMORY.md — added IntSet visited entry, updated mode-analysis description with measured speedups.
todo.md — marked tasks Add SQL subquery support (Phase 4) #186/Add SQL Window Functions support (Phase 5a) #187/Add SQL Window Functions Phase 5b+5c (Aggregates & LAG/LEAD) #188/Implement PowerShell Semantic Target (XML & Vector) #191/Add Window Frame Specification support (Phase 5d) #192/Integrate CASE WHEN expressions into SELECT generation #193 as completed; new pending entry for max_depth crossover-point exploration.

Verification

All 5 lowering / runtime / state-analysis test suites stay green.
Cabal e2e test still passes.
The benchmark itself runs end-to-end at both 1k and 10k.

Test plan

Re-run the benchmark at multiple scales to confirm the directionality holds: WAM_EFF_DIST_BENCH_SCALE=10k swipl -t halt tests/benchmarks/wam_effective_distance_macro_bench.pl
(Future) Benchmark at max_depth=50 or higher to find the IntSet crossover point.

Refs

PR feat(wam-haskell): macro benchmark + ?-mode analyser fix + IntSet design` #1684 — IntSet visited design package (WAM_HASKELL_INTSET_VISITED_DESIGN.md).
PR feat(wam-haskell): IntSet visited Layer 1 — VSet variant + 3 instructions + handlers #1686 — Layer 1 (runtime + ADT + handlers).
PR feat(wam-haskell): IntSet visited Layer 2 (partial) — \+ member lowering + directive #1690 — Layer 2 partial (\+ member lowering).
PR feat(wam-haskell): IntSet visited Layer 2.5 — call-site arg rewrite #1692 — Layer 2.5 (call-site arg rewrite).
Task Add SQL functions: NULL handling, string, date, BETWEEN, LIKE #194 (this PR) — closes the arc with measured numbers.

🤖 Generated with Claude Code

…ialise at depth=10 Extends wam_effective_distance_macro_bench.pl to a 3-way comparison (unlowered / Phase G lowered / Phase H IntSet) and measures all three on the effective-distance workload. Adds WAM_EFF_DIST_BENCH_SCALE env var so the bench can run against 1k or 10k facts. Measured at 10k scale (462 tuples, max_depth=10, 6 trials with rotating order): unlowered (no directives) mean 931.5 ms lowered (Phase G mode only) mean 861.0 ms intset (Phase G + Phase H) mean 957.5 ms lowered vs unlowered: 1.082x (Phase G constant-factor win, as expected) intset vs lowered: 0.899x (IntSet ~10% SLOWER than list) intset vs unlowered: 0.973x (IntSet barely matches the slow path) tuple_count=462 matches across all three (correctness preserved). The design predicted 1.5-3x speedup from O(N) -> O(log N) at max_depth=10. Reality: the Patricia-trie constant factors (per-insert allocation, cache scattering, node traversal) exceed the linear walk cost on a ~10-element list. The algorithmic improvement doesn't amortise its constant factors at this size. Phase H appendix in WAM_PERF_OPTIMIZATION_LOG.md captures the measurement, the per-trial numbers, and an honest "the algorithmic win didn't materialise here" reading. Three takeaways: 1. Algorithmic wins aren't free at small N — the design's "1.5-3x expected" was asymptotic reasoning that didn't account for IntSet constant factors. 2. The infrastructure is reusable — VSet, the directive, and the codegen paths can host other set representations (sorted array, bitmap) that might win at small N. 3. Phase G is the real macro win on this workload. Phase H's algorithmic pivot was the wrong move for max_depth=10. The IntSet implementation remains correct and opt-in: users with deep visited sets may benefit (untested, but plausible at max_depth>=50); users at typical depth should leave the directive off. Closes task #194 (IntSet macro bench + memory cleanup). Memory files updated outside the repo (project_wam_haskell_intset_visited.md gets the honest finding; project_wam_haskell_mode_analysis.md extended with the cross-arc summary; MEMORY.md and todo.md refreshed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…et-macro-bench-and-memory # Conflicts: # docs/design/WAM_PERF_OPTIMIZATION_LOG.md

s243a and others added 2 commits April 28, 2026 20:46

Merge remote-tracking branch 'origin/main' into feat/wam-haskell-ints…

8e758b3

…et-macro-bench-and-memory # Conflicts: # docs/design/WAM_PERF_OPTIMIZATION_LOG.md

s243a merged commit 0871172 into main Apr 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(wam-haskell): IntSet 3-way macro — algorithmic win didn't materialise at depth=10`#1698

bench(wam-haskell): IntSet 3-way macro — algorithmic win didn't materialise at depth=10`#1698
s243a merged 2 commits intomainfrom
feat/wam-haskell-intset-macro-bench-and-memory

s243a commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

s243a commented Apr 29, 2026

Summary

Measured numbers

Why the algorithmic O(log N) didn't pay off

Honest takeaways

Changes

tests/benchmarks/wam_effective_distance_macro_bench.pl (~90 LOC delta)

docs/design/WAM_PERF_OPTIMIZATION_LOG.md

Memory updates (outside repo)

Verification

Test plan

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`tests/benchmarks/wam_effective_distance_macro_bench.pl` (~90 LOC delta)

`docs/design/WAM_PERF_OPTIMIZATION_LOG.md`