Skip to content

EVM: inline getOp for execution speedup #1043

Merged
msooseth merged 2 commits intomainfrom
perf
Mar 12, 2026
Merged

EVM: inline getOp for execution speedup #1043
msooseth merged 2 commits intomainfrom
perf

Conversation

@elopez
Copy link
Collaborator

@elopez elopez commented Mar 7, 2026

Description

This PR inlines getOp. I see a measurable speed improvement with this change locally. The optimization was suggested by Claude Code.

Checklist

  • tested locally
  • added automated tests
  • updated the docs
  • updated the changelog

Copy link
Collaborator

@msooseth msooseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm OK with such optimizations in case they benefit Echidna speed. Symbolic execution speed is mostly governed by SMT solver speed, so as long as this works for you, I'm good with it. However, it'd be nice to post a bench measurement test to the text of the PR so when we wanna understand things in the future, we can look it up.

@msooseth
Copy link
Collaborator

msooseth commented Mar 9, 2026

Perhaps @blishko wants to bench-test this too? Either way, it'd be important to have some bench test numbers here as part of the documentation for this change, though?

@elopez
Copy link
Collaborator Author

elopez commented Mar 9, 2026

hm interesting! I just re-benchmarked this over #1047 and it tells a very different story. I suppose GC involvement makes it hard to measure accurately 🤔 Let's hold off merging for now cc @blishko

bench-perf
All
  loop
    4096:  OK
      80.7 ms ± 4.7 ms, 234% more than baseline
    8192:  OK
      144  ms ±  13 ms, 199% more than baseline
    16384: OK
      286  ms ±  27 ms, 208% more than baseline
  primes
    4096:  OK
      977  ms ±  75 ms, 197% more than baseline
    8192:  OK
      2.646 s ±  30 ms, 222% more than baseline
    16384: OK
      6.481 s ± 530 ms, 209% more than baseline
  hashes
    4096:  OK
      131  ms ± 8.2 ms, 217% more than baseline
    8192:  OK
      255  ms ±  17 ms, 205% more than baseline
    16384: OK
      460  ms ±  18 ms, 178% more than baseline
  hashmem
    4096:  OK
      174  ms ±  17 ms, 154% more than baseline
    8192:  OK
      391  ms ±  23 ms, 180% more than baseline
    16384: OK
      731  ms ±  35 ms, 160% more than baseline
  balanceTransfer
    4096:  OK
      109  ms ±  11 ms, 162% more than baseline
    8192:  OK
      193  ms ±  13 ms, 132% more than baseline
    16384: OK
      409  ms ±  20 ms, 145% more than baseline
  funcCall
    4096:  OK
      94.8 ms ± 9.1 ms, 203% more than baseline
    8192:  OK
      189  ms ±  12 ms, 203% more than baseline
    16384: OK
      408  ms ±  39 ms, 215% more than baseline
  contractCreation
    4096:  OK
      573  ms ±  21 ms, 249% more than baseline
    8192:  OK
      1.148 s ±  21 ms, 240% more than baseline
    16384: OK
      2.265 s ± 154 ms, 226% more than baseline
  contractCreationMem
    4096:  OK
      2.255 s ±  63 ms, 134% more than baseline
    8192:  OK
      4.527 s ± 340 ms, 137% more than baseline
    16384: OK
      8.954 s ± 696 ms, 125% more than baseline
  mapStorage
    4096:  OK
      212  ms ±  14 ms, 149% more than baseline
    8192:  OK
      427  ms ±  41 ms, 149% more than baseline
    16384: OK
      907  ms ±  78 ms, 159% more than baseline
  swapOperations
    4096:  OK
      151  ms ±  11 ms, 261% more than baseline
    8192:  OK
      270  ms ±  11 ms, 223% more than baseline
    16384: OK
      598  ms ±  47 ms, 264% more than baseline

All 30 tests passed (108.15s)

@elopez elopez marked this pull request as draft March 9, 2026 14:00
elopez added 2 commits March 9, 2026 18:32
Adding {-# INLINE getOp #-} enables GHC's case-of-case transformation,
eliminating the intermediate GenericOp constructor allocation in opcode
dispatch. This also lets GHC sink expensive thunks (contract lookups,
fee schedule fields) into only the opcode branches that use them.
Add doBenchmark = true to shellFor so that benchmark dependencies
(tasty-bench) are available in nix develop without cabal update.
Replace deprecated testTarget with testTargets.
@elopez
Copy link
Collaborator Author

elopez commented Mar 9, 2026

hm I don't know what was going on with my system before 🤔 I rebased and re-ran the benchmark here and it doesn't give me the horrible numbers from before anymore. But it's still not a clear win.

bench-perf 4c1013f vs 2baa264 (main)
[3 of 3] Linking dist-newstyle/build/aarch64-osx/ghc-9.8.4/hevm-0.57.0/b/bench-perf/build/bench-perf/bench-perf [Objects changed]
All
  loop
    2:     OK
      23.3 μs ± 1.8 μs, 13% less than baseline
    4:     OK
      33.1 μs ± 2.2 μs,  9% less than baseline
    8:     OK
      53.2 μs ± 3.5 μs,       same as baseline
    16:    OK
      93.4 μs ± 6.7 μs,       same as baseline
    32:    OK
      172  μs ±  16 μs,       same as baseline
    64:    OK
      332  μs ±  29 μs,       same as baseline
    128:   OK
      646  μs ±  33 μs,       same as baseline
    256:   OK
      1.29 ms ±  59 μs,       same as baseline
    512:   OK
      2.55 ms ± 215 μs,       same as baseline
    1024:  OK
      5.05 ms ± 451 μs,       same as baseline
    2048:  OK
      10.0 ms ± 994 μs,       same as baseline
    4096:  OK
      19.9 ms ± 1.9 ms,       same as baseline
    8192:  OK
      39.8 ms ± 3.8 ms,       same as baseline
    16384: OK
      79.7 ms ± 7.1 ms,       same as baseline
  primes
    2:     OK
      73.1 μs ± 4.4 μs,       same as baseline
    4:     OK
      106  μs ± 8.5 μs,       same as baseline
    8:     OK
      165  μs ± 9.6 μs,       same as baseline
    16:    OK
      314  μs ±  24 μs,       same as baseline
    32:    OK
      673  μs ±  38 μs,  7% less than baseline
    64:    OK
      1.47 ms ± 132 μs,       same as baseline
    128:   OK
      3.42 ms ± 212 μs,  6% less than baseline
    256:   OK
      8.09 ms ± 519 μs,       same as baseline
    512:   OK
      19.5 ms ± 1.7 ms,       same as baseline
    1024:  OK
      47.0 ms ± 4.2 ms,       same as baseline
    2048:  OK
      115  ms ± 8.8 ms,       same as baseline
    4096:  OK
      287  ms ±  18 ms,       same as baseline
    8192:  OK
      715  ms ±  14 ms,  5% less than baseline
    16384: OK
      1.820 s ± 125 ms,       same as baseline
  hashes
    2:     OK
      30.4 μs ± 2.6 μs,       same as baseline
    4:     OK
      48.4 μs ± 2.2 μs,       same as baseline
    8:     OK
      84.4 μs ± 5.4 μs,       same as baseline
    16:    OK
      156  μs ±  10 μs,       same as baseline
    32:    OK
      307  μs ±  30 μs,       same as baseline
    64:    OK
      600  μs ±  26 μs,       same as baseline
    128:   OK
      1.16 ms ±  98 μs,       same as baseline
    256:   OK
      2.32 ms ± 139 μs,       same as baseline
    512:   OK
      4.58 ms ± 240 μs,       same as baseline
    1024:  OK
      9.27 ms ± 505 μs,       same as baseline
    2048:  OK
      18.9 ms ± 1.4 ms,       same as baseline
    4096:  OK
      38.4 ms ± 2.5 ms,       same as baseline
    8192:  OK
      75.7 ms ± 5.3 ms,       same as baseline
    16384: OK
      152  ms ± 6.7 ms,       same as baseline
  hashmem
    2:     OK
      47.9 μs ± 2.4 μs,       same as baseline
    4:     OK
      74.5 μs ± 4.1 μs,       same as baseline
    8:     OK
      127  μs ±  12 μs,       same as baseline
    16:    OK
      234  μs ±  22 μs,       same as baseline
    32:    OK
      448  μs ±  43 μs,       same as baseline
    64:    OK
      877  μs ±  60 μs,       same as baseline
    128:   OK
      1.73 ms ± 153 μs,       same as baseline
    256:   OK
      3.51 ms ± 297 μs,       same as baseline
    512:   OK
      7.09 ms ± 613 μs,       same as baseline
    1024:  OK
      15.0 ms ± 1.4 ms,       same as baseline
    2048:  OK
      30.7 ms ± 861 μs,       same as baseline
    4096:  OK
      61.5 ms ± 5.9 ms,       same as baseline
    8192:  OK
      124  ms ± 5.8 ms,       same as baseline
    16384: OK
      254  ms ±  14 ms,       same as baseline
  balanceTransfer
    2:     OK
      3.75 ms ± 208 μs, 23% more than baseline
    4:     OK
      3.72 ms ± 304 μs, 22% more than baseline
    8:     OK
      3.46 ms ± 281 μs, 12% more than baseline
    16:    OK
      3.53 ms ± 274 μs, 13% more than baseline
    32:    OK
      3.71 ms ± 153 μs, 13% more than baseline
    64:    OK
      3.85 ms ± 322 μs, 10% more than baseline
    128:   OK
      4.29 ms ± 324 μs, 10% more than baseline
    256:   OK
      5.02 ms ± 169 μs,       same as baseline
    512:   OK
      6.97 ms ± 541 μs,       same as baseline
    1024:  OK
      10.0 ms ± 931 μs,       same as baseline
    2048:  OK
      18.5 ms ± 1.8 ms,       same as baseline
    4096:  OK
      36.8 ms ± 3.0 ms,       same as baseline
    8192:  OK
      73.3 ms ± 3.5 ms,       same as baseline
    16384: OK
      149  ms ±  14 ms,       same as baseline
  funcCall
    2:     OK
      36.8 μs ± 1.7 μs,       same as baseline
    4:     OK
      50.7 μs ± 3.5 μs,       same as baseline
    8:     OK
      77.7 μs ± 3.5 μs,       same as baseline
    16:    OK
      132  μs ± 8.8 μs,       same as baseline
    32:    OK
      241  μs ±  15 μs,       same as baseline
    64:    OK
      461  μs ±  31 μs,       same as baseline
    128:   OK
      897  μs ±  69 μs,       same as baseline
    256:   OK
      1.81 ms ± 123 μs,       same as baseline
    512:   OK
      3.52 ms ± 282 μs,       same as baseline
    1024:  OK
      6.99 ms ± 447 μs,       same as baseline
    2048:  OK
      14.0 ms ± 1.2 ms,       same as baseline
    4096:  OK
      27.4 ms ± 2.1 ms,       same as baseline
    8192:  OK
      54.7 ms ± 4.3 ms,       same as baseline
    16384: OK
      109  ms ± 8.5 ms,       same as baseline
  contractCreation
    2:     OK
      62.3 μs ± 3.4 μs,       same as baseline
    4:     OK
      101  μs ± 3.5 μs,       same as baseline
    8:     OK
      177  μs ± 9.8 μs,       same as baseline
    16:    OK
      333  μs ±  18 μs,       same as baseline
    32:    OK
      653  μs ±  40 μs,       same as baseline
    64:    OK
      1.27 ms ±  92 μs,       same as baseline
    128:   OK
      2.69 ms ± 218 μs,       same as baseline
    256:   OK
      5.87 ms ± 471 μs,       same as baseline
    512:   OK
      14.0 ms ± 1.3 ms,       same as baseline
    1024:  OK
      32.7 ms ± 3.2 ms,       same as baseline
    2048:  OK
      70.2 ms ± 4.9 ms,       same as baseline
    4096:  OK
      153  ms ±  11 ms,       same as baseline
    8192:  OK
      322  ms ±  26 ms,       same as baseline
    16384: OK
      675  ms ±  53 ms,       same as baseline
  contractCreationMem
    2:     OK
      237  μs ±  20 μs,       same as baseline
    4:     OK
      409  μs ±  22 μs,       same as baseline
    8:     OK
      777  μs ±  70 μs,       same as baseline
    16:    OK
      1.53 ms ±  75 μs,       same as baseline
    32:    OK
      3.20 ms ± 122 μs,       same as baseline
    64:    OK
      7.31 ms ± 456 μs,       same as baseline
    128:   OK
      19.3 ms ± 866 μs,       same as baseline
    256:   OK
      44.1 ms ± 2.8 ms,       same as baseline
    512:   OK
      97.2 ms ± 4.7 ms,       same as baseline
    1024:  OK
      215  ms ±  17 ms,       same as baseline
    2048:  OK
      440  ms ±  37 ms,       same as baseline
    4096:  OK
      922  ms ±  25 ms,       same as baseline
    8192:  OK
      1.846 s ±  62 ms,       same as baseline
    16384: OK
      3.669 s ± 187 ms,       same as baseline
  arrayCreationMem
    2:     OK
      108  μs ± 6.8 μs,       same as baseline
    4:     OK
      282  μs ±  14 μs,       same as baseline
    8:     OK
      1.04 ms ±  48 μs,       same as baseline
    16:    OK
      3.63 ms ± 313 μs,       same as baseline
    32:    OK
      13.7 ms ± 1.1 ms,       same as baseline
    64:    OK
      54.3 ms ± 4.5 ms,       same as baseline
    128:   OK
      216  ms ±  16 ms,       same as baseline
    256:   OK
      855  ms ±  61 ms,       same as baseline
    512:   OK
      3.490 s ± 101 ms,       same as baseline
  mapStorage
    2:     OK
      51.3 μs ± 2.3 μs,       same as baseline
    4:     OK
      78.7 μs ± 4.2 μs,       same as baseline
    8:     OK
      143  μs ± 9.1 μs,       same as baseline
    16:    OK
      271  μs ±  23 μs,       same as baseline
    32:    OK
      528  μs ±  38 μs,       same as baseline
    64:    OK
      1.05 ms ±  77 μs,       same as baseline
    128:   OK
      2.10 ms ± 167 μs,       same as baseline
    256:   OK
      4.24 ms ± 381 μs,       same as baseline
    512:   OK
      8.76 ms ± 625 μs,       same as baseline
    1024:  OK
      18.2 ms ± 1.6 ms,       same as baseline
    2048:  OK
      37.5 ms ± 2.8 ms,       same as baseline
    4096:  OK
      76.1 ms ± 4.2 ms,       same as baseline
    8192:  OK
      156  ms ± 9.6 ms,       same as baseline
    16384: OK
      314  ms ±  18 ms,       same as baseline
  swapOperations
    2:     OK
      128  μs ± 7.4 μs,       same as baseline
    4:     OK
      150  μs ± 8.0 μs,       same as baseline
    8:     OK
      192  μs ±  16 μs,       same as baseline
    16:    OK
      274  μs ±  15 μs,       same as baseline
    32:    OK
      443  μs ±  37 μs,       same as baseline
    64:    OK
      770  μs ±  52 μs,       same as baseline
    128:   OK
      1.42 ms ± 137 μs,       same as baseline
    256:   OK
      2.74 ms ± 240 μs,       same as baseline
    512:   OK
      5.36 ms ± 491 μs,       same as baseline
    1024:  OK
      10.6 ms ± 983 μs,       same as baseline
    2048:  OK
      20.8 ms ± 2.0 ms,       same as baseline
    4096:  OK
      41.2 ms ± 3.7 ms,       same as baseline
    8192:  OK
      82.1 ms ± 7.2 ms,       same as baseline
    16384: OK
      163  ms ±  14 ms,       same as baseline

All 149 tests passed (605.59s)

Copy link
Collaborator

@blishko blishko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my computer I see very slight improvement on bench-perf.
I think it is OK to include this change.

@blishko
Copy link
Collaborator

blishko commented Mar 10, 2026

@elopez @msooseth, How do you feel about this currently? Benchmarking on my computer showed a very small improvement. I am OK with merging it.

@elopez elopez marked this pull request as ready for review March 10, 2026 17:56
@elopez
Copy link
Collaborator Author

elopez commented Mar 10, 2026

@blishko I'm ok if you want to merge it (or pick just 4c1013f). What system did you test on? (linux, mac? x86 or arm?)

@blishko
Copy link
Collaborator

blishko commented Mar 10, 2026

I have arm Mac.
I would merge both commits.

@blishko
Copy link
Collaborator

blishko commented Mar 12, 2026

For completeness, I am adding numbers from my machine:

arm-mac
Details
All
  loop
    2:     OK
      27.8 μs ± 2.4 μs,       same as baseline
    4:     OK
      39.5 μs ± 3.4 μs,       same as baseline
    8:     OK
      63.7 μs ± 4.2 μs,       same as baseline
    16:    OK
      110  μs ± 7.3 μs,       same as baseline
    32:    OK
      204  μs ±  14 μs,       same as baseline
    64:    OK
      389  μs ±  32 μs,       same as baseline
    128:   OK
      764  μs ±  58 μs,       same as baseline
    256:   OK
      1.52 ms ± 109 μs,       same as baseline
    512:   OK
      3.02 ms ± 256 μs,       same as baseline
    1024:  OK
      6.02 ms ± 467 μs,       same as baseline
    2048:  OK
      11.9 ms ± 1.1 ms,       same as baseline
    4096:  OK
      23.6 ms ± 1.7 ms,       same as baseline
    8192:  OK
      47.0 ms ± 3.9 ms,       same as baseline
    16384: OK
      93.7 ms ± 7.5 ms,       same as baseline
  primes
    2:     OK
      85.3 μs ± 8.2 μs,       same as baseline
    4:     OK
      122  μs ± 6.6 μs,       same as baseline
    8:     OK
      191  μs ±  16 μs,       same as baseline
    16:    OK
      360  μs ±  29 μs,       same as baseline
    32:    OK
      765  μs ±  69 μs,       same as baseline
    64:    OK
      1.68 ms ± 107 μs,       same as baseline
    128:   OK
      3.84 ms ± 256 μs,       same as baseline
    256:   OK
      9.21 ms ± 519 μs,       same as baseline
    512:   OK
      22.1 ms ± 2.0 ms,       same as baseline
    1024:  OK
      54.1 ms ± 3.9 ms,       same as baseline
    2048:  OK
      131  ms ± 8.2 ms,       same as baseline
    4096:  OK
      328  ms ±  21 ms,       same as baseline
    8192:  OK
      834  ms ±  72 ms,       same as baseline
    16384: OK
      2.087 s ±  72 ms,       same as baseline
  hashes
    2:     OK
      35.0 μs ± 3.0 μs,       same as baseline
    4:     OK
      55.5 μs ± 3.6 μs,       same as baseline
    8:     OK
      96.0 μs ± 7.8 μs,       same as baseline
    16:    OK
      176  μs ± 6.9 μs,       same as baseline
    32:    OK
      337  μs ±  18 μs,       same as baseline
    64:    OK
      665  μs ±  57 μs,       same as baseline
    128:   OK
      1.30 ms ±  55 μs,       same as baseline
    256:   OK
      2.59 ms ± 116 μs,       same as baseline
    512:   OK
      5.23 ms ± 489 μs,       same as baseline
    1024:  OK
      10.6 ms ± 918 μs,       same as baseline
    2048:  OK
      21.7 ms ± 2.0 ms,       same as baseline
    4096:  OK
      42.6 ms ± 2.2 ms,       same as baseline
    8192:  OK
      85.2 ms ± 5.8 ms,       same as baseline
    16384: OK
      170  ms ±  16 ms,       same as baseline
  hashmem
    2:     OK
      55.2 μs ± 4.4 μs,       same as baseline
    4:     OK
      85.8 μs ± 7.0 μs,       same as baseline
    8:     OK
      143  μs ± 9.1 μs,       same as baseline
    16:    OK
      260  μs ±  16 μs,       same as baseline
    32:    OK
      492  μs ±  27 μs,       same as baseline
    64:    OK
      967  μs ±  84 μs,       same as baseline
    128:   OK
      1.93 ms ± 146 μs,       same as baseline
    256:   OK
      3.85 ms ± 288 μs,       same as baseline
    512:   OK
      7.80 ms ± 715 μs,       same as baseline
    1024:  OK
      16.3 ms ± 1.1 ms,       same as baseline
    2048:  OK
      33.9 ms ± 2.6 ms,       same as baseline
    4096:  OK
      67.7 ms ± 4.5 ms,       same as baseline
    8192:  OK
      139  ms ±  10 ms,       same as baseline
    16384: OK
      283  ms ±  19 ms,       same as baseline
  balanceTransfer
    2:     OK
      2.68 ms ± 243 μs,       same as baseline
    4:     OK
      2.70 ms ± 217 μs,       same as baseline
    8:     OK
      2.73 ms ± 222 μs,       same as baseline
    16:    OK
      2.79 ms ± 276 μs,       same as baseline
    32:    OK
      2.89 ms ± 264 μs,       same as baseline
    64:    OK
      3.11 ms ± 217 μs,       same as baseline
    128:   OK
      3.54 ms ± 215 μs,       same as baseline
    256:   OK
      4.42 ms ± 296 μs,       same as baseline
    512:   OK
      6.08 ms ± 435 μs,       same as baseline
    1024:  OK
      10.3 ms ± 879 μs,       same as baseline
    2048:  OK
      20.0 ms ± 1.9 ms,       same as baseline
    4096:  OK
      40.6 ms ± 3.7 ms,       same as baseline
    8192:  OK
      81.9 ms ± 8.1 ms,       same as baseline
    16384: OK
      170  ms ±  16 ms,       same as baseline
  funcCall
    2:     OK
      44.0 μs ± 4.0 μs,       same as baseline
    4:     OK
      60.4 μs ± 3.7 μs,       same as baseline
    8:     OK
      91.9 μs ± 8.1 μs,       same as baseline
    16:    OK
      154  μs ± 7.3 μs,       same as baseline
    32:    OK
      280  μs ±  15 μs,       same as baseline
    64:    OK
      531  μs ±  31 μs,       same as baseline
    128:   OK
      1.04 ms ±  59 μs,       same as baseline
    256:   OK
      2.07 ms ± 125 μs,       same as baseline
    512:   OK
      4.09 ms ± 273 μs,       same as baseline
    1024:  OK
      8.14 ms ± 458 μs,       same as baseline
    2048:  OK
      16.2 ms ± 1.0 ms,       same as baseline
    4096:  OK
      31.9 ms ± 2.6 ms,       same as baseline
    8192:  OK
      63.9 ms ± 4.5 ms,       same as baseline
    16384: OK
      128  ms ± 8.2 ms,       same as baseline
  contractCreation
    2:     OK
      71.2 μs ± 3.4 μs,       same as baseline
    4:     OK
      114  μs ± 9.3 μs,       same as baseline
    8:     OK
      197  μs ±  17 μs,       same as baseline
    16:    OK
      365  μs ±  16 μs,       same as baseline
    32:    OK
      720  μs ±  35 μs,       same as baseline
    64:    OK
      1.40 ms ±  59 μs,       same as baseline
    128:   OK
      2.95 ms ± 288 μs,       same as baseline
    256:   OK
      6.14 ms ± 344 μs,       same as baseline
    512:   OK
      14.9 ms ± 498 μs,       same as baseline
    1024:  OK
      34.8 ms ± 2.5 ms,       same as baseline
    2048:  OK
      75.8 ms ± 3.5 ms,       same as baseline
    4096:  OK
      165  ms ±  14 ms,       same as baseline
    8192:  OK
      336  ms ±  30 ms,       same as baseline
    16384: OK
      717  ms ±  58 ms,       same as baseline
  contractCreationMem
    2:     OK
      268  μs ±  16 μs,       same as baseline
    4:     OK
      468  μs ±  32 μs,       same as baseline
    8:     OK
      865  μs ±  61 μs,       same as baseline
    16:    OK
      1.75 ms ± 112 μs,       same as baseline
    32:    OK
      3.58 ms ± 341 μs,       same as baseline
    64:    OK
      8.28 ms ± 544 μs,       same as baseline
    128:   OK
      21.5 ms ± 1.5 ms,       same as baseline
    256:   OK
      48.1 ms ± 2.4 ms,       same as baseline
    512:   OK
      107  ms ± 7.0 ms,       same as baseline
    1024:  OK
      229  ms ± 7.6 ms,       same as baseline
    2048:  OK
      472  ms ±  32 ms,       same as baseline
    4096:  OK
      990  ms ±  32 ms,       same as baseline
    8192:  OK
      1.986 s ± 109 ms,       same as baseline
    16384: OK
      3.939 s ± 184 ms,       same as baseline
  arrayCreationMem
    2:     OK
      123  μs ± 7.3 μs,       same as baseline
    4:     OK
      318  μs ±  22 μs,       same as baseline
    8:     OK
      1.06 ms ±  67 μs,       same as baseline
    16:    OK
      3.89 ms ± 256 μs,       same as baseline
    32:    OK
      15.1 ms ± 928 μs,       same as baseline
    64:    OK
      60.4 ms ± 3.8 ms,       same as baseline
    128:   OK
      237  ms ±  14 ms,       same as baseline
    256:   OK
      950  ms ±  51 ms,  5% less than baseline
    512:   OK
      3.759 s ±  53 ms,  4% less than baseline
  mapStorage
    2:     OK
      54.8 μs ± 3.3 μs,       same as baseline
    4:     OK
      89.7 μs ± 7.8 μs,       same as baseline
    8:     OK
      160  μs ±  11 μs,       same as baseline
    16:    OK
      300  μs ±  21 μs,       same as baseline
    32:    OK
      591  μs ±  29 μs,       same as baseline
    64:    OK
      1.16 ms ±  85 μs,       same as baseline
    128:   OK
      2.39 ms ± 217 μs,       same as baseline
    256:   OK
      4.81 ms ± 450 μs,       same as baseline
    512:   OK
      9.73 ms ± 546 μs,       same as baseline
    1024:  OK
      20.6 ms ± 963 μs,       same as baseline
    2048:  OK
      41.8 ms ± 2.1 ms,       same as baseline
    4096:  OK
      86.4 ms ± 7.7 ms,       same as baseline
    8192:  OK
      172  ms ± 9.5 ms,       same as baseline
    16384: OK
      352  ms ±  14 ms,       same as baseline
  swapOperations
    2:     OK
      149  μs ± 6.6 μs,       same as baseline
    4:     OK
      174  μs ±  13 μs,       same as baseline
    8:     OK
      224  μs ±  17 μs,       same as baseline
    16:    OK
      317  μs ±  13 μs,       same as baseline
    32:    OK
      511  μs ±  32 μs,       same as baseline
    64:    OK
      893  μs ±  58 μs,       same as baseline
    128:   OK
      1.66 ms ± 140 μs,       same as baseline
    256:   OK
      3.20 ms ± 216 μs,       same as baseline
    512:   OK
      6.26 ms ± 447 μs,       same as baseline
    1024:  OK
      12.2 ms ± 1.0 ms,       same as baseline
    2048:  OK
      24.2 ms ± 1.9 ms,       same as baseline
    4096:  OK
      47.9 ms ± 3.8 ms,       same as baseline
    8192:  OK
      94.8 ms ± 8.6 ms,       same as baseline
    16384: OK
      190  ms ±  17 ms,       same as baseline

I was also eyeballing the numbers from main and on average, numbers from this branch tended to be slightly better, even though the framework evaluates it as "same as baseline".
So I believe this change should not have a negative effect and we can merge it.

@blishko
Copy link
Collaborator

blishko commented Mar 12, 2026

@msooseth, Are you OK with merging? If so, can you please dismiss the "Requested changes" review?

Copy link
Collaborator

@msooseth msooseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@msooseth msooseth merged commit 8da7ea4 into main Mar 12, 2026
9 of 10 checks passed
@msooseth msooseth deleted the perf branch March 12, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants