Skip to content

EVM: enable -O2 -fworker-wrapper-cbv for main EVM module#1045

Draft
elopez wants to merge 1 commit intomainfrom
perf-o2
Draft

EVM: enable -O2 -fworker-wrapper-cbv for main EVM module#1045
elopez wants to merge 1 commit intomainfrom
perf-o2

Conversation

@elopez
Copy link
Collaborator

@elopez elopez commented Mar 7, 2026

Description

bench-perf shows a >10% reduction in primes, loop, swapOperations. The optimization was suggested by Claude Code.

Checklist

  • tested locally
  • added automated tests
  • updated the docs
  • updated the changelog

@msooseth
Copy link
Collaborator

msooseth commented Mar 9, 2026

I'm not against it, in case it truly is faster. Do you have a perf measure you could post maybe into the comment so we can track this down later?

@elopez elopez marked this pull request as draft March 9, 2026 14:19
bench-perf shows a >10% reduction in primes, loop, swapOperations
@elopez
Copy link
Collaborator Author

elopez commented Mar 9, 2026

Rebased and re-ran the benchmark here after the recent changes, unfortunately the improvement is not too noticeable.

bench-perf 144c081 vs 2baa264 (main)
[3 of 3] Linking dist-newstyle/build/aarch64-osx/ghc-9.8.4/hevm-0.57.0/b/bench-perf/build/bench-perf/bench-perf
All
  loop
    2:     OK
      24.7 μs ± 2.0 μs,  9% less than baseline
    4:     OK
      35.9 μs ± 2.2 μs,       same as baseline
    8:     OK
      55.2 μs ± 3.9 μs,       same as baseline
    16:    OK
      100  μs ± 9.3 μs,       same as baseline
    32:    OK
      189  μs ±  16 μs,       same as baseline
    64:    OK
      344  μs ± 6.7 μs,       same as baseline
    128:   OK
      670  μs ±  26 μs,       same as baseline
    256:   OK
      1.34 ms ±  55 μs,       same as baseline
    512:   OK
      2.68 ms ± 235 μs,       same as baseline
    1024:  OK
      5.28 ms ± 498 μs,       same as baseline
    2048:  OK
      10.5 ms ± 990 μs,       same as baseline
    4096:  OK
      20.7 ms ± 879 μs,       same as baseline
    8192:  OK
      41.5 ms ± 3.8 ms,       same as baseline
    16384: OK
      82.4 ms ± 7.1 ms,       same as baseline
  primes
    2:     OK
      70.5 μs ± 2.0 μs,       same as baseline
    4:     OK
      106  μs ± 8.0 μs,       same as baseline
    8:     OK
      167  μs ±  12 μs,       same as baseline
    16:    OK
      329  μs ±  20 μs,       same as baseline
    32:    OK
      700  μs ±  33 μs,       same as baseline
    64:    OK
      1.54 ms ± 123 μs,       same as baseline
    128:   OK
      3.53 ms ± 253 μs,       same as baseline
    256:   OK
      8.43 ms ± 680 μs,       same as baseline
    512:   OK
      20.1 ms ± 860 μs,       same as baseline
    1024:  OK
      49.0 ms ± 3.8 ms,       same as baseline
    2048:  OK
      120  ms ± 8.1 ms,       same as baseline
    4096:  OK
      298  ms ±  15 ms,       same as baseline
    8192:  OK
      747  ms ±  73 ms,       same as baseline
    16384: OK
      1.892 s ±  78 ms,       same as baseline
  hashes
    2:     OK
      30.5 μs ± 2.2 μs,       same as baseline
    4:     OK
      49.1 μs ± 1.9 μs,       same as baseline
    8:     OK
      87.2 μs ± 5.8 μs,       same as baseline
    16:    OK
      161  μs ±  12 μs,       same as baseline
    32:    OK
      309  μs ±  26 μs,       same as baseline
    64:    OK
      608  μs ±  42 μs,       same as baseline
    128:   OK
      1.22 ms ±  58 μs,       same as baseline
    256:   OK
      2.40 ms ± 193 μs,       same as baseline
    512:   OK
      4.80 ms ± 286 μs,       same as baseline
    1024:  OK
      9.73 ms ± 677 μs,       same as baseline
    2048:  OK
      19.5 ms ± 1.4 ms,       same as baseline
    4096:  OK
      39.1 ms ± 2.7 ms,       same as baseline
    8192:  OK
      77.9 ms ± 6.0 ms,       same as baseline
    16384: OK
      155  ms ±  15 ms,       same as baseline
  hashmem
    2:     OK
      47.8 μs ± 4.6 μs,       same as baseline
    4:     OK
      75.3 μs ± 5.2 μs,       same as baseline
    8:     OK
      131  μs ±  11 μs,       same as baseline
    16:    OK
      243  μs ±  22 μs,       same as baseline
    32:    OK
      463  μs ±  45 μs,       same as baseline
    64:    OK
      914  μs ±  80 μs,       same as baseline
    128:   OK
      1.83 ms ± 164 μs,       same as baseline
    256:   OK
      3.58 ms ± 185 μs,       same as baseline
    512:   OK
      7.33 ms ± 634 μs,       same as baseline
    1024:  OK
      15.3 ms ± 1.1 ms,       same as baseline
    2048:  OK
      31.7 ms ± 2.4 ms,       same as baseline
    4096:  OK
      63.5 ms ± 4.2 ms,       same as baseline
    8192:  OK
      129  ms ± 9.6 ms,       same as baseline
    16384: OK
      261  ms ±  15 ms,       same as baseline
  balanceTransfer
    2:     OK
      3.57 ms ± 330 μs, 17% more than baseline
    4:     OK
      3.57 ms ± 307 μs, 17% more than baseline
    8:     OK
      3.65 ms ± 126 μs, 19% more than baseline
    16:    OK
      3.69 ms ± 291 μs, 18% more than baseline
    32:    OK
      3.85 ms ± 358 μs, 18% more than baseline
    64:    OK
      4.06 ms ± 333 μs, 16% more than baseline
    128:   OK
      4.41 ms ± 435 μs, 13% more than baseline
    256:   OK
      5.20 ms ± 514 μs,       same as baseline
    512:   OK
      6.96 ms ± 498 μs,       same as baseline
    1024:  OK
      10.5 ms ± 912 μs,       same as baseline
    2048:  OK
      19.3 ms ± 1.3 ms,       same as baseline
    4096:  OK
      38.1 ms ± 3.7 ms,       same as baseline
    8192:  OK
      76.9 ms ± 5.1 ms,       same as baseline
    16384: OK
      153  ms ± 4.9 ms,       same as baseline
  funcCall
    2:     OK
      35.6 μs ± 2.4 μs,       same as baseline
    4:     OK
      50.2 μs ± 4.3 μs,       same as baseline
    8:     OK
      78.5 μs ± 3.9 μs,       same as baseline
    16:    OK
      134  μs ± 7.9 μs,       same as baseline
    32:    OK
      246  μs ±  16 μs,       same as baseline
    64:    OK
      472  μs ±  27 μs,       same as baseline
    128:   OK
      929  μs ±  55 μs,       same as baseline
    256:   OK
      1.83 ms ± 162 μs,       same as baseline
    512:   OK
      3.61 ms ± 213 μs,       same as baseline
    1024:  OK
      7.17 ms ± 479 μs,       same as baseline
    2048:  OK
      14.2 ms ± 1.1 ms,       same as baseline
    4096:  OK
      28.2 ms ± 1.9 ms,       same as baseline
    8192:  OK
      56.2 ms ± 3.7 ms,       same as baseline
    16384: OK
      114  ms ±  10 ms,       same as baseline
  contractCreation
    2:     OK
      65.3 μs ± 2.6 μs,       same as baseline
    4:     OK
      100  μs ± 5.1 μs,       same as baseline
    8:     OK
      175  μs ± 9.1 μs,       same as baseline
    16:    OK
      331  μs ±  20 μs,       same as baseline
    32:    OK
      667  μs ±  49 μs,       same as baseline
    64:    OK
      1.30 ms ±  52 μs,       same as baseline
    128:   OK
      2.68 ms ± 112 μs,       same as baseline
    256:   OK
      5.63 ms ± 346 μs,       same as baseline
    512:   OK
      14.0 ms ± 1.3 ms,       same as baseline
    1024:  OK
      32.6 ms ± 2.0 ms,       same as baseline
    2048:  OK
      72.3 ms ± 3.5 ms,       same as baseline
    4096:  OK
      158  ms ± 8.9 ms,       same as baseline
    8192:  OK
      323  ms ±  27 ms,       same as baseline
    16384: OK
      654  ms ±  34 ms,       same as baseline
  contractCreationMem
    2:     OK
      228  μs ±  16 μs,       same as baseline
    4:     OK
      396  μs ±  39 μs,       same as baseline
    8:     OK
      743  μs ±  36 μs,  5% less than baseline
    16:    OK
      1.51 ms ± 129 μs,       same as baseline
    32:    OK
      3.11 ms ± 205 μs,       same as baseline
    64:    OK
      7.05 ms ± 586 μs,       same as baseline
    128:   OK
      18.6 ms ± 1.2 ms,       same as baseline
    256:   OK
      43.5 ms ± 4.2 ms,       same as baseline
    512:   OK
      97.0 ms ± 8.5 ms,       same as baseline
    1024:  OK
      207  ms ±  20 ms,       same as baseline
    2048:  OK
      438  ms ±  42 ms,       same as baseline
    4096:  OK
      894  ms ±  32 ms,       same as baseline
    8192:  OK
      1.849 s ± 112 ms,       same as baseline
    16384: OK
      3.782 s ±  64 ms,       same as baseline
  arrayCreationMem
    2:     OK
      123  μs ±  12 μs,       same as baseline
    4:     OK
      326  μs ±  32 μs,       same as baseline
    8:     OK
      1.06 ms ±  61 μs,  6% more than baseline
    16:    OK
      3.91 ms ± 288 μs,       same as baseline
    32:    OK
      15.2 ms ± 928 μs,       same as baseline
    64:    OK
      61.2 ms ± 5.4 ms,       same as baseline
    128:   OK
      253  ms ±  21 ms, 11% more than baseline
    256:   OK
      963  ms ±  63 ms,  7% more than baseline
    512:   OK
      3.845 s ± 222 ms,  8% more than baseline
  mapStorage
    2:     OK
      52.7 μs ± 3.7 μs,  7% more than baseline
    4:     OK
      88.8 μs ± 7.9 μs,       same as baseline
    8:     OK
      154  μs ± 8.6 μs,       same as baseline
    16:    OK
      291  μs ±  19 μs,       same as baseline
    32:    OK
      566  μs ±  32 μs,       same as baseline
    64:    OK
      1.13 ms ±  84 μs,       same as baseline
    128:   OK
      2.27 ms ± 124 μs,       same as baseline
    256:   OK
      4.63 ms ± 220 μs,       same as baseline
    512:   OK
      9.51 ms ± 442 μs,       same as baseline
    1024:  OK
      19.9 ms ± 920 μs,       same as baseline
    2048:  OK
      40.5 ms ± 1.8 ms,       same as baseline
    4096:  OK
      82.5 ms ± 4.5 ms,       same as baseline
    8192:  OK
      170  ms ± 9.8 ms,       same as baseline
    16384: OK
      351  ms ±  22 ms,       same as baseline
  swapOperations
    2:     OK
      138  μs ±  13 μs,       same as baseline
    4:     OK
      162  μs ± 8.1 μs,  5% more than baseline
    8:     OK
      209  μs ±  16 μs,       same as baseline
    16:    OK
      297  μs ±  26 μs,       same as baseline
    32:    OK
      481  μs ±  27 μs,       same as baseline
    64:    OK
      847  μs ±  62 μs,       same as baseline
    128:   OK
      1.57 ms ± 125 μs,       same as baseline
    256:   OK
      3.02 ms ± 242 μs,       same as baseline
    512:   OK
      5.97 ms ± 406 μs,       same as baseline
    1024:  OK
      11.7 ms ± 1.0 ms,       same as baseline
    2048:  OK
      22.8 ms ± 854 μs,       same as baseline
    4096:  OK
      45.4 ms ± 3.4 ms,       same as baseline
    8192:  OK
      90.7 ms ± 6.8 ms,       same as baseline
    16384: OK
      180  ms ±  15 ms,       same as baseline

All 149 tests passed (650.75s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants