Conversation
Replace optics-based ix/zoom swap with direct splitAt-based list swap. The old code used two ix traversals + zoom + two .= assignments. The new code uses a single splitAt + list construction. bench-perf shows a ∼10% reduction in primes and loop
|
Have you considered the more compact version? I am getting uncertain results compared to your version. |
|
I hadn't tried that one, but just benchmarked it against the one on the PR and indeed the results are inconclusive among the two. I'm ok with either of them, the resulting code seems to be quite similar, see below. Claude's interpretation from -ddump-simplOpSwap GHC Core Comparison: 3 VariantsSource CodeVariant 0 — Old (Optics:
|
| Aspect | V0 (Optics) | V1 (Current) | V2 (Proposed) |
|---|---|---|---|
| List traversals | 4 (2 read + 2 write) | 1 (splitAt) | 1 (splitAt) |
| VM+FrameState rebuilds | 3 (next + 2× zoom) | 1 (assign') | 1 (assign') |
| Functor overhead | Yes (fmap per cons) | None | None |
| Maybe wrapping | Yes (2× Just/Nothing) | None | None |
| Subtraction | No | Yes (idx-1) | No |
| Join point | No | Yes | No |
| Short-circuit swap(1) | No | Yes (skips splitAt) | No (goes to underrun) |
| Closures per call | ~4 × idx | ~0 | ~0 |
Verdict
Variant 1 (current) is the best. It performs 4× fewer list traversals than V0 and builds
the VM record only once instead of three times. Compared to V2, it gains a join point
optimization and a short-circuit path for the common swap(1) case. The idx-1
subtraction is a single machine instruction — negligible cost. V2's only advantage
(avoiding that subtraction) does not compensate for losing the join point and short-circuit.
|
I have very slight preference for my version, as it expresses the intent more concisely. |
From @blishko's suggestion
Sounds good, pushed that instead. Feel free to squash if you'd like 👍 |
|
We can keep both versions in the history! |
Description
Replace optics-based ix/zoom swap with direct splitAt-based list swap. The old code used two ix traversals + zoom + two .= assignments. The new code uses a single splitAt + list construction.
bench-perf shows a ∼10% reduction in primes and loop. The optimization was suggested by Claude Code.
Checklist