Conversation
…ialise at depth=10 Extends wam_effective_distance_macro_bench.pl to a 3-way comparison (unlowered / Phase G lowered / Phase H IntSet) and measures all three on the effective-distance workload. Adds WAM_EFF_DIST_BENCH_SCALE env var so the bench can run against 1k or 10k facts. Measured at 10k scale (462 tuples, max_depth=10, 6 trials with rotating order): unlowered (no directives) mean 931.5 ms lowered (Phase G mode only) mean 861.0 ms intset (Phase G + Phase H) mean 957.5 ms lowered vs unlowered: 1.082x (Phase G constant-factor win, as expected) intset vs lowered: 0.899x (IntSet ~10% SLOWER than list) intset vs unlowered: 0.973x (IntSet barely matches the slow path) tuple_count=462 matches across all three (correctness preserved). The design predicted 1.5-3x speedup from O(N) -> O(log N) at max_depth=10. Reality: the Patricia-trie constant factors (per-insert allocation, cache scattering, node traversal) exceed the linear walk cost on a ~10-element list. The algorithmic improvement doesn't amortise its constant factors at this size. Phase H appendix in WAM_PERF_OPTIMIZATION_LOG.md captures the measurement, the per-trial numbers, and an honest "the algorithmic win didn't materialise here" reading. Three takeaways: 1. Algorithmic wins aren't free at small N — the design's "1.5-3x expected" was asymptotic reasoning that didn't account for IntSet constant factors. 2. The infrastructure is reusable — VSet, the directive, and the codegen paths can host other set representations (sorted array, bitmap) that might win at small N. 3. Phase G is the real macro win on this workload. Phase H's algorithmic pivot was the wrong move for max_depth=10. The IntSet implementation remains correct and opt-in: users with deep visited sets may benefit (untested, but plausible at max_depth>=50); users at typical depth should leave the directive off. Closes task #194 (IntSet macro bench + memory cleanup). Memory files updated outside the repo (project_wam_haskell_intset_visited.md gets the honest finding; project_wam_haskell_mode_analysis.md extended with the cross-arc summary; MEMORY.md and todo.md refreshed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…et-macro-bench-and-memory # Conflicts: # docs/design/WAM_PERF_OPTIMIZATION_LOG.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the IntSet visited arc with measured numbers — and the design's predicted speedup did not materialise at the workload's actual recursion depth. This is a real and worth-reporting finding.
tests/benchmarks/wam_effective_distance_macro_bench.plto a 3-way comparison (unlowered/ Phase Glowered/ Phase Hintset).WAM_EFF_DIST_BENCH_SCALEenv var so the bench can run against 1k or 10k facts.WAM_PERF_OPTIMIZATION_LOG.mdPhase H final entry with the measured numbers and an honest reading of why the algorithmic improvement didn't pay off.Measured numbers
10k scale (462 tuples,
max_depth=10, 6 trials with rotating order):modeonly)tuple_count=462matches across all three — correctness preserved across both the Phase G lowering and the Phase H IntSet path.Why the algorithmic O(log N) didn't pay off
The design predicted ~1.5–3× speedup from
O(N) → O(log N)atmax_depth=10. Reality: the Patricia-trie constant factors exceed the linear walk cost on a ~10-element list. Specifically:IS.insertallocates new tree nodes per cons-extension (purely functional).[X|V]allocates one cons cell. For shallow visited sets, the allocation cost dominates.IS.memberon a 10-element IntSet does ~3-4 tag-checks and comparisons in a Patricia trie, vs at most 10 simple==checks in the list. Constant factors are surprisingly close at this size.The IntSet wins likely materialize at deeper visited sets (
max_depth ≥ 50); the canonical effective-distance workload doesn't reach there. Phase G'snot_member_listlowering (skipping the put_structure + builtin_call dispatch + heap term allocation) is the actual macro win on this workload.Honest takeaways
VSet, the directive, and the codegen paths can host other set representations (sorted array, small bitmap) that might beat IntSet at small N. Filed as future exploration.max_depth=10. The implementation remains correct and opt-in: users at typical depth should leave the directive off.Changes
tests/benchmarks/wam_effective_distance_macro_bench.pl(~90 LOC delta)intsetvariant ingenerate_project/2that asserts both the mode declarations AND:- visited_set(category_ancestor/4, 4).main/0to run all three variants twice with rotating order, report 3-way comparison with three speedup ratios.WAM_EFF_DIST_BENCH_SCALEenv var routesfacts_path/1todata/benchmark/<scale>/facts.pl. Defaults to 1k for fast smoke runs; setWAM_EFF_DIST_BENCH_SCALE=10kfor the macro measurement.docs/design/WAM_PERF_OPTIMIZATION_LOG.mdPhase H final entry's "Macro benchmark — TODO" replaced with the measured 10k results, the per-trial numbers, the constant-factor analysis, and the three honest takeaways.
Memory updates (outside repo)
project_wam_haskell_intset_visited.md— updated "what's not measured yet" → measured, with the honest finding.project_wam_haskell_mode_analysis.md— extended with cross-arc summary and per-arc benchmark numbers.MEMORY.md— added IntSet visited entry, updated mode-analysis description with measured speedups.todo.md— marked tasks Add SQL subquery support (Phase 4) #186/Add SQL Window Functions support (Phase 5a) #187/Add SQL Window Functions Phase 5b+5c (Aggregates & LAG/LEAD) #188/Implement PowerShell Semantic Target (XML & Vector) #191/Add Window Frame Specification support (Phase 5d) #192/Integrate CASE WHEN expressions into SELECT generation #193 as completed; new pending entry for max_depth crossover-point exploration.Verification
Test plan
WAM_EFF_DIST_BENCH_SCALE=10k swipl -t halt tests/benchmarks/wam_effective_distance_macro_bench.plmax_depth=50or higher to find the IntSet crossover point.Refs
WAM_HASKELL_INTSET_VISITED_DESIGN.md).\+ memberlowering).🤖 Generated with Claude Code