U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns) by koko1123 · Pull Request #30 · StrobeLabs/eth.zig

koko1123 · 2026-03-03T14:12:01Z

Summary

Closes the u256 compound arithmetic performance gap with Rust/alloy. The uniswapv2_naive benchmark went from 87ns (3.48x slower than Rust) to 20ns (1.20x faster).

Key Optimizations

Scalar multiply (mulLimbScalar): Multiplying [4]u64 limbs by a single u64 uses 4 mul/umulh pairs instead of 10 for full 4x4 schoolbook. Used for fee constants (997, 1000) in UniswapV2.
Specialized 128-by-128 division (div2by2): Inline register-only path for the common case where both numerator and divisor fit in 2 limbs. Avoids Knuth D's runtime-loop array overhead and eliminates __udivti3 software division calls (~50ns penalty discovered via disassembly).
Limb-native benchmark: Rewrote uniswapv2_naive to use [4]u64 limb operations directly — apples-to-apples comparison with Rust's [u64; 4] U256 type.

Other Changes

Fixed Rust benchmark: checked_div -> / (removed unfair Option handling overhead)
Made limb functions pub inline (mulLimbs, addLimbs, mulLimbScalar)
Used @bitCast for zero-cost u256 <-> [4]u64 conversion
Added *.a, *.o, *.s to .gitignore (build artifacts)

Benchmark Results

Benchmark                       eth-zig   alloy.rs          Result
---------------------------- ---------- ---------- ----------------
u256_add                           3 ns       3 ns              tie
u256_mul_small                     3 ns       2 ns         rs 1.50x
u256_mul_full                      3 ns       4 ns        zig 1.33x
u256_div_small                     6 ns      17 ns        zig 2.83x
u256_div_full                     44 ns      27 ns         rs 1.63x
u256_uniswapv2_naive              20 ns      24 ns        zig 1.20x
u256_mulDiv                       38 ns      29 ns         rs 1.31x
u256_uniswapv4_swap               45 ns      45 ns              tie

Score: eth-zig 3/8 | alloy.rs 3/8 | tied 2/8

Previous score: eth-zig 2/8 | alloy.rs 4/8 | tied 2/8

Test plan

zig build test passes
zig build bench-u256 runs successfully
bash bench/compare_u256.sh shows improved results
uniswapv2_naive: 87ns -> 20ns (4.35x speedup, now beats Rust)

Summary by CodeRabbit

Tests
- Added comprehensive U256 benchmark suites (Rust and Zig) and an apples‑to‑apples comparison script; integrated criterion-style harnesses for consistent timing.
Chores
- Added a Zig benchmark target to the build and updated ignore patterns for build artifacts.
Refactor
- Optimized and inlined core 256‑bit arithmetic paths to improve benchmark performance.
Documentation
- Updated README and benchmark results with revised measurements, claims, and comparison summaries.

…dedicated u256 comparison

vercel · 2026-03-03T14:12:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
eth-zig	Ready	Preview, Comment	Mar 4, 2026 2:01pm

coderabbitai · 2026-03-03T14:12:25Z

📝 Walkthrough

Walkthrough

Adds a U256 benchmarking suite and comparison tooling across Zig and Rust; introduces a new Rust bench, a Zig u256 benchmark executable, comparison scripts, and performance-focused changes plus visibility/inline adjustments in the Zig u256 implementation.

Changes

Cohort / File(s)	Summary
Benchmark Config & Cargo `bench/alloy-bench/Cargo.toml`	Adds a second [[bench]] entry named `u256_comparison` (harness = false).
Alloy.rs Bench Files `bench/alloy-bench/benches/u256_comparison.rs`, `bench/alloy-bench/benches/eth_comparison.rs`	Adds a new Rust U256 benchmark suite and simplifies sqrt_price construction in the existing bench to use a direct u128 constant (Q96).
Zig Bench Harness & Build `bench/u256_bench.zig`, `build.zig`, `build.zig.zon`	Adds a Zig Criterion-style u256 benchmark executable, wires a new u256-bench build/run step, and removes the zbench dependency from the manifest.
Comparison Orchestration `bench/compare_u256.sh`, `bench/compare.sh`	Adds a bash orchestrator to run Zig and Rust benches, parse and normalize outputs (embedded Python), and print side-by-side comparisons; updates parsing behavior for Zig output.
Zig u256 Implementation `src/uint256.zig`	Large changes: expose/inline limb helpers (u256ToLimbs, limbsToU256, mulLimbs, addLimbs, fastMul, fastDiv, divLimbsDirect), add mulLimbScalar and div2by2, and route getAmountOut to scalar limb path with a 2-limb division optimization.
Zig Bench Changes `bench/bench.zig`, `bench/keccak_compare.zig`, `bench/u256_bench.zig`	Convert allocator-taking bench functions to allocator-free signatures and add an internal benchmark harness (warmup, dynamic batching, timed runs) with print/JSON reporting.
Auxiliary / Docs / Results `README.md`, `bench/RESULTS.md`	Update README marketing and benchmark claims; update RESULTS.md with new per-benchmark outcomes and harness details.
Misc / Gitignore `.gitignore`	Add ignore patterns for build artifacts (`.a`, `.o`, `*.s`).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Orchestrator as compare_u256.sh
    participant ZigBench as eth.zig (u256_bench)
    participant RustBench as alloy.rs (u256_comparison)
    participant Parser as Python_Parser
    participant Output as Results_Table

    User->>Orchestrator: run compare_u256.sh
    Orchestrator->>ZigBench: build & run zig bench-u256
    ZigBench->>ZigBench: execute u256 benchmarks
    ZigBench-->>Orchestrator: bench output (text/JSON)
    Orchestrator->>RustBench: cargo bench --bench u256_comparison
    RustBench->>RustBench: execute criterion benches
    RustBench-->>Orchestrator: criterion output
    Orchestrator->>Parser: feed both outputs
    Parser->>Parser: extract ns/op, normalize names/units
    Parser-->>Orchestrator: structured timings
    Orchestrator->>Output: render comparison table & summary
    Output-->>User: display results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Optimization: u256 limb-based compound arithmetic #17 — Changes to src/uint256.zig (mulLimbScalar, div2by2, exposing/inlining fastMul/fastDiv) implement the limb-arithmetic optimizations discussed in the issue.

Possibly related PRs

Optimize u256 UniswapV2 getAmountOut: 107ns → 40ns (2.7x) #29 — Overlaps in uint256 limb-arithmetic changes and visibility/optimization edits.
Migrate benchmarks to zbench and make comptime selectors runtime selectors #24 — Related to benchmarking infra changes and differing harness approaches (zbench vs custom harness).
Performance optimizations + SEO: 19/26 benchmark wins vs alloy.rs #4 — Related additions for mulDiv and Uniswap-style benchmarks using extended intermediates.

Poem

🐰
I hopped through limbs of bits and bytes,
Counting carries through sleepless nights,
Zig and Rust raced, carrot in view,
Timings tallied — one, two, woo!
Benchmarks bloom where rabbits chew. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main optimization: U256 limb-native arithmetic improvements that achieve a 4.35× speedup (87ns to 20ns) on UniswapV2, which aligns with the PR's primary objective of beating Rust performance on this benchmark.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/u256-benchmark-fairness

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/alloy-bench/benches/u256_comparison.rs`:
- Around line 78-84: The div benchmarks currently call U256::checked_div (in the
bench function names "div_small" and "div_full"), which adds Option-handling
overhead; replace those calls with the raw division operator (use
black_box(large) / black_box(ONE_ETH) and black_box(large) / black_box(FULL_C)
respectively) since ONE_ETH and FULL_C are non-zero constants, and update both
the div_small and div_full closures to perform direct division with / instead of
checked_div to make the Rust benchmarks comparable to Zig's fastDiv.

In `@bench/compare_u256.sh`:
- Around line 20-34: The script currently captures full benchmark logs into
shell variables ZIG_OUTPUT and RUST_OUTPUT and passes them as argv to the
embedded Python (python3 - "$ZIG_OUTPUT" "$RUST_OUTPUT"), which can hit
argument-length limits; instead write both outputs to temporary files (e.g.,
create secure temp files), pass the temp file paths or read them inside the
Python here-doc, and clean up the temps afterward; update references around
ZIG_OUTPUT, RUST_OUTPUT and the python3 invocation so the Python script reads
from those temp files (or from stdin via a single here-doc) rather than
receiving huge logs via argv, and apply the same change for the second
occurrence at lines 39-40.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7fdc66 and 8eeced0.

📒 Files selected for processing (6)

bench/alloy-bench/Cargo.toml
bench/alloy-bench/benches/eth_comparison.rs
bench/alloy-bench/benches/u256_comparison.rs
bench/compare_u256.sh
bench/u256_bench.zig
build.zig

bench/alloy-bench/benches/u256_comparison.rs

bench/compare_u256.sh

Three key optimizations to close the u256 compound arithmetic gap: 1. Scalar multiply (mulLimbScalar): 4 mul/umulh pairs instead of 10 for single-limb multipliers like fee constants (997, 1000) 2. Specialized 128-by-128 division (div2by2): inline register-only division for 2-limb operands, avoiding Knuth D's array overhead and __udivti3 software division (~50ns penalty) 3. Limb-native benchmark: rewrite uniswapv2_naive to use [4]u64 limb operations directly (apples-to-apples with Rust's [u64; 4] U256) Also: fix Rust benchmark using checked_div instead of / operator, gitignore build artifacts (*.a, *.o, *.s), make limb functions pub inline, use @bitcast for u256<->limbs conversion. Score: eth-zig 3/8 | alloy.rs 3/8 | tied 2/8 (was 2/8 | 4/8 | 2/8)

coderabbitai

🧹 Nitpick comments (5)

bench/u256_bench.zig (2)

47-52: Add an overflow guard while doubling batch.

batch *= 2 has no cap. A simple max check avoids potential wraparound/infinite-loop behavior in calibration edge cases.

Small robustness patch

     var batch: u64 = 64;
     while (true) {
         timer.reset();
         for (0..batch) |_| func();
         if (timer.read() >= 100_000_000) break; // 100ms
+        if (batch > std.math.maxInt(u64) / 2) break;
         batch *= 2;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench/u256_bench.zig` around lines 47 - 52, In the calibration loop that
calls timer.reset(); for (0..batch) |_| func(); if (timer.read() >= 100_000_000)
break; batch *= 2; add an overflow guard so doubling batch cannot wrap: before
multiplying check batch against a safe maximum (e.g., compare to (usize.max / 2)
or use a constant MAX_BATCH) and either clamp batch to that max or break out of
the loop if it would overflow; update the code around the while (true) loop so
batch is only doubled when safe, preventing wraparound/infinite-loop behavior.

196-217: Run each benchmark once and reuse the result for both outputs.

Right now every case is measured twice (table + JSON). That can introduce drift between reported values and unnecessarily extends run time. Capture one BenchResult per case and print both formats from it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench/u256_bench.zig` around lines 196 - 217, Currently each benchmark
(benchAdd, benchMulSmall, etc.) is executed twice via runAndPrint and
runAndJson; change the flow so each benchmark is executed once to produce a
single BenchResult and then print both formats from that result. Concretely: add
or use a function that runs a benchmark and returns a BenchResult (e.g.,
runBenchmark or make runAndPrint return BenchResult), call it once for each
symbol (benchAdd, benchMulSmall, benchMulFull, benchDivSmall, benchDivFull,
benchUniswapV2Naive, benchUniswapV2Optimized, benchMulDiv, benchUniswapV4Swap)
storing the result, then pass that BenchResult into printing helpers (create or
adapt runAndPrint/runAndJson to accept a BenchResult and stdout) so stdout.print
is unchanged but each case is measured only once.

.gitignore (1)

8-11: Avoid globally ignoring assembly sources.

Line 11 (*.s) can accidentally hide new checked-in assembly source files. If the goal is only generated artifacts, keep this scoped to build output paths (or drop *.s).

Proposed tweak

 # Build artifacts
 *.a
 *.o
-*.s

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 8 - 11, The .gitignore entry "*.s" is too broad and
can hide legitimate checked-in assembly sources; remove or narrow it: either
delete the "*.s" pattern or scope it to generated build output paths (e.g.,
replace "*.s" with something like "build/*.s" or "out/**/*.s" or add the pattern
under a build/ or obj/ directory), ensuring only generated assembly artifacts
are ignored while leaving source .s files trackable; update the .gitignore
accordingly where the "*.s" pattern appears.

src/uint256.zig (2)

99-105: Document limb ordering for public conversion helpers.

Now that u256ToLimbs and limbsToU256 are public, please document the exact limb order (least-significant to most-significant) to avoid cross-module misuse.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/uint256.zig` around lines 99 - 105, The public conversion helpers
u256ToLimbs and limbsToU256 lack documentation about limb ordering; update both
function doc comments to explicitly state that the returned/accepted [4]u64
array is ordered from least-significant limb at index 0 to most-significant limb
at index 3 (i.e., little-endian limb order), and mention that callers must use
this ordering when converting between u256 and limbs to avoid cross-module
misuse; add the same clarification to any related public API docs or comments
referencing these helpers.

387-390: Add a direct test for the new 2-limb fast path.

Line 387 introduces a dedicated dd == 2 and nn <= 2 branch, but there isn’t a focused test that guarantees this path remains correct across refactors.

Suggested test addition

+test "fastDiv 2-limb numerator 2-limb divisor fast path" {
+    const n: u256 = (`@as`(u256, 1) << 127) + 123456789;
+    const d: u256 = (`@as`(u256, 1) << 96) + 7;
+    try std.testing.expectEqual(n / d, fastDiv(n, d));
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/uint256.zig` around lines 387 - 390, Add a unit test that directly
exercises the new 2-limb fast path by invoking the division routine with dd == 2
and nn <= 2 so the branch with div2by2 is taken; create tests that call the
public division/quotient function (or the tested function that reads
numerator/denominator limbs) using two-limb divisors and one- and two-limb
numerators, covering typical, edge and random cases (e.g., small values, max
limb values, and boundary carries) and assert the quotient/remainder match a
reference implementation (big-int or existing slow-path division). Ensure the
test targets the symbols involved (div2by2 and the surrounding division
function) so future refactors will fail if the fast path is incorrect.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.gitignore:
- Around line 8-11: The .gitignore entry "*.s" is too broad and can hide
legitimate checked-in assembly sources; remove or narrow it: either delete the
"*.s" pattern or scope it to generated build output paths (e.g., replace "*.s"
with something like "build/*.s" or "out/**/*.s" or add the pattern under a
build/ or obj/ directory), ensuring only generated assembly artifacts are
ignored while leaving source .s files trackable; update the .gitignore
accordingly where the "*.s" pattern appears.

In `@bench/u256_bench.zig`:
- Around line 47-52: In the calibration loop that calls timer.reset(); for
(0..batch) |_| func(); if (timer.read() >= 100_000_000) break; batch *= 2; add
an overflow guard so doubling batch cannot wrap: before multiplying check batch
against a safe maximum (e.g., compare to (usize.max / 2) or use a constant
MAX_BATCH) and either clamp batch to that max or break out of the loop if it
would overflow; update the code around the while (true) loop so batch is only
doubled when safe, preventing wraparound/infinite-loop behavior.
- Around line 196-217: Currently each benchmark (benchAdd, benchMulSmall, etc.)
is executed twice via runAndPrint and runAndJson; change the flow so each
benchmark is executed once to produce a single BenchResult and then print both
formats from that result. Concretely: add or use a function that runs a
benchmark and returns a BenchResult (e.g., runBenchmark or make runAndPrint
return BenchResult), call it once for each symbol (benchAdd, benchMulSmall,
benchMulFull, benchDivSmall, benchDivFull, benchUniswapV2Naive,
benchUniswapV2Optimized, benchMulDiv, benchUniswapV4Swap) storing the result,
then pass that BenchResult into printing helpers (create or adapt
runAndPrint/runAndJson to accept a BenchResult and stdout) so stdout.print is
unchanged but each case is measured only once.

In `@src/uint256.zig`:
- Around line 99-105: The public conversion helpers u256ToLimbs and limbsToU256
lack documentation about limb ordering; update both function doc comments to
explicitly state that the returned/accepted [4]u64 array is ordered from
least-significant limb at index 0 to most-significant limb at index 3 (i.e.,
little-endian limb order), and mention that callers must use this ordering when
converting between u256 and limbs to avoid cross-module misuse; add the same
clarification to any related public API docs or comments referencing these
helpers.
- Around line 387-390: Add a unit test that directly exercises the new 2-limb
fast path by invoking the division routine with dd == 2 and nn <= 2 so the
branch with div2by2 is taken; create tests that call the public
division/quotient function (or the tested function that reads
numerator/denominator limbs) using two-limb divisors and one- and two-limb
numerators, covering typical, edge and random cases (e.g., small values, max
limb values, and boundary carries) and assert the quotient/remainder match a
reference implementation (big-int or existing slow-path division). Ensure the
test targets the symbols involved (div2by2 and the surrounding division
function) so future refactors will fail if the fast path is incorrect.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8eeced0 and 7d0573c.

📒 Files selected for processing (4)

.gitignore
bench/alloy-bench/benches/u256_comparison.rs
bench/u256_bench.zig
src/uint256.zig

🚧 Files skipped from review as they are similar to previous changes (1)

bench/alloy-bench/benches/u256_comparison.rs

Replace zbench with a custom benchmark harness (0.5s warmup, calibrated batch sizes, 2s measurement window). zbench's ~42ns macOS timer floor was masking real performance -- u256_add measured 25ns under zbench, actual is 2ns. With accurate measurement: eth.zig wins 20/26 benchmarks vs alloy.rs (was 14/26 under zbench). Key corrections: - u256_add: 25ns -> 2ns (ties alloy) - u256_mul: 26ns -> 2ns (2.50x faster) - u256_div: 28ns -> 3ns (4.00x faster) - abi_decode_dynamic: 40ns -> 17ns (7.94x faster) - hex_encode: 36ns -> 11ns (1.09x faster, was a loss) - uniswapv2: 30ns -> 10ns (1.30x faster, was a loss) Also: remove zbench dependency from build.zig.zon, port keccak_compare.zig to custom harness, use FixedBufferAllocator for allocating benchmarks, update README and RESULTS.md.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

bench/bench.zig (2)

48-99: Extract shared bench harness to avoid cross-file drift.

runBench/runAndPrint logic is now repeated across benchmark entrypoints. Centralizing these helpers into one module will reduce maintenance risk when tuning warmup/calibration behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench/bench.zig` around lines 48 - 99, The bench harness (runBench and
runAndPrint) is duplicated across benchmarks; extract these functions into a
single shared module (e.g., bench/harness.zig) and update each benchmark
entrypoint to import that module and call the shared runBench/runAndPrint.
Preserve the functions' signatures (comptime func: fn() void and
runAndPrint(name: []const u8, func: fn() void, stdout: anytype) !void), retain
Timer usage and constants (WARMUP_NS, BENCH_NS) in the shared module, and ensure
benchmarks replace their local implementations with calls to the imported
functions.

175-449: Factor out repeated FixedBufferAllocator setup in benchmark functions.

The same local allocator scaffolding is copied across many benches. A tiny helper wrapper would reduce copy/paste surface and keep per-bench logic focused on the operation under test.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench/bench.zig` around lines 175 - 449, Many benchmarks duplicate the
FixedBufferAllocator setup (init of std.heap.FixedBufferAllocator, getting
alloc) — create a small helper (e.g., withFixedBufferAllocator or
makeFixedAllocator) that encapsulates the buffer allocation and returns the
allocator (or takes a callback that receives alloc) and replace the repeated
blocks in functions like benchAbiEncodeTransfer, benchAbiEncodeStatic,
benchAbiEncodeDynamic, benchRlpEncodeTx, benchTxHashEip1559, benchEip712Hash,
etc.; update those functions to call the helper for an allocator instead of
repeating the buf/fba/alloc boilerplate so each bench only contains the
test-specific logic.

bench/RESULTS.md (1)

7-7: Include exact toolchain/platform versions for reproducibility.

Adding CPU model/binning, macOS version, Zig version, and Rust toolchain version will make these benchmark deltas easier to validate over time.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench/RESULTS.md` at line 7, Update the "Benchmarks run on Apple Silicon..."
line in RESULTS.md to include explicit toolchain and platform versions for
reproducibility: append CPU model/binning (e.g., M1/M2 with specific SKU), macOS
version, Zig version, Rust toolchain (cargo/rustc) and target triple, and any
compiler flags used (ReleaseFast/--release), so the existing Benchmarks run on
Apple Silicon... sentence contains exact CPU, macOS, Zig and Rust versions and
flags.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/RESULTS.md`:
- Line 57: Update the table row containing "Custom criterion-style harness" to
avoid overstating precision: replace "Accurate sub-ns measurement; zbench had
~25ns floor on macOS" with wording like "Accurate sub-25ns regime; zbench had
~25ns floor on macOS" (or similar phrasing) to reflect reported integer ns
outputs rather than claiming true sub-nanosecond precision.

---

Nitpick comments:
In `@bench/bench.zig`:
- Around line 48-99: The bench harness (runBench and runAndPrint) is duplicated
across benchmarks; extract these functions into a single shared module (e.g.,
bench/harness.zig) and update each benchmark entrypoint to import that module
and call the shared runBench/runAndPrint. Preserve the functions' signatures
(comptime func: fn() void and runAndPrint(name: []const u8, func: fn() void,
stdout: anytype) !void), retain Timer usage and constants (WARMUP_NS, BENCH_NS)
in the shared module, and ensure benchmarks replace their local implementations
with calls to the imported functions.
- Around line 175-449: Many benchmarks duplicate the FixedBufferAllocator setup
(init of std.heap.FixedBufferAllocator, getting alloc) — create a small helper
(e.g., withFixedBufferAllocator or makeFixedAllocator) that encapsulates the
buffer allocation and returns the allocator (or takes a callback that receives
alloc) and replace the repeated blocks in functions like benchAbiEncodeTransfer,
benchAbiEncodeStatic, benchAbiEncodeDynamic, benchRlpEncodeTx,
benchTxHashEip1559, benchEip712Hash, etc.; update those functions to call the
helper for an allocator instead of repeating the buf/fba/alloc boilerplate so
each bench only contains the test-specific logic.

In `@bench/RESULTS.md`:
- Line 7: Update the "Benchmarks run on Apple Silicon..." line in RESULTS.md to
include explicit toolchain and platform versions for reproducibility: append CPU
model/binning (e.g., M1/M2 with specific SKU), macOS version, Zig version, Rust
toolchain (cargo/rustc) and target triple, and any compiler flags used
(ReleaseFast/--release), so the existing Benchmarks run on Apple Silicon...
sentence contains exact CPU, macOS, Zig and Rust versions and flags.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d0573c and 98b5c71.

📒 Files selected for processing (7)

README.md
bench/RESULTS.md
bench/bench.zig
bench/compare.sh
bench/keccak_compare.zig
build.zig
build.zig.zon

✅ Files skipped from review due to trivial changes (1)

README.md

bench/RESULTS.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

bench/RESULTS.md (1)
78-78: ⚠️ Potential issue | 🟡 Minor

Reproduction command is incomplete for published results.

Line 78 shows cargo bench --bench eth_comparison, but the results table includes u256_mulDiv (row 5), which is only produced by the separate u256_comparison bench target. To fully reproduce the published numbers, run both bench targets:
(cd bench/alloy-bench && cargo bench --bench eth_comparison && cargo bench --bench u256_comparison)
Alternatively, reference the existing bash bench/compare_u256.sh that already runs the u256 benchmarks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/RESULTS.md` at line 78, The reproduction command in RESULTS.md only
runs the eth_comparison bench but the published table includes results from the
u256_comparison target (e.g., u256_mulDiv), so update the command to run both
bench targets or point to the existing script; specifically change the command
that runs `cargo bench --bench eth_comparison` to run `cargo bench --bench
eth_comparison && cargo bench --bench u256_comparison` or replace it
with/mention the existing `bench/compare_u256.sh` script so `u256_comparison`
results are produced when reproducing the published numbers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bench/RESULTS.md`:
- Line 78: The reproduction command in RESULTS.md only runs the eth_comparison
bench but the published table includes results from the u256_comparison target
(e.g., u256_mulDiv), so update the command to run both bench targets or point to
the existing script; specifically change the command that runs `cargo bench
--bench eth_comparison` to run `cargo bench --bench eth_comparison && cargo
bench --bench u256_comparison` or replace it with/mention the existing
`bench/compare_u256.sh` script so `u256_comparison` results are produced when
reproducing the published numbers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f42096b-6187-402b-ba13-4cea1c4b0ada

📥 Commits

Reviewing files that changed from the base of the PR and between 98b5c71 and f88277e.

📒 Files selected for processing (1)

bench/RESULTS.md

Fix u256 benchmark fairness: correct sqrt_price construction and add …

8eeced0

…dedicated u256 comparison

vercel bot deployed to Preview March 3, 2026 14:12 View deployment

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

bench/alloy-bench/benches/u256_comparison.rs Show resolved Hide resolved

bench/compare_u256.sh Show resolved Hide resolved

vercel bot deployed to Preview March 3, 2026 19:49 View deployment

koko1123 changed the title ~~Fix u256 benchmark fairness: correct sqrt_price and add dedicated u256 comparison~~ U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns) Mar 3, 2026

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

vercel bot deployed to Preview March 3, 2026 20:42 View deployment

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

bench/RESULTS.md Outdated Show resolved Hide resolved

Update bench/RESULTS.md

f88277e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

vercel bot deployed to Preview March 4, 2026 14:01 View deployment

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

koko1123 merged commit 6c09375 into main Mar 4, 2026
10 checks passed

coderabbitai bot mentioned this pull request Mar 10, 2026

Fix 21 security, correctness, and code quality issues #44

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns)#30

U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns)#30
koko1123 merged 4 commits intomainfrom
fix/u256-benchmark-fairness

koko1123 commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

koko1123 commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Optimizations

Other Changes

Benchmark Results

Test plan

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

koko1123 commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Mar 3, 2026 •

edited

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading