Skip to content

Optimization: comptime RLP struct serialization #20

@koko1123

Description

@koko1123

Summary

Generate struct-specific RLP encoders at comptime to close the 1.08x gap on EIP-1559 transaction encoding.

Current Performance

Benchmark eth.zig alloy.rs Gap
rlp_encode_eip1559_tx 41 ns 38 ns 1.08x loss
rlp_decode_u256 3 ns 6 ns 2.00x win

We already improved from 89 ns to 41 ns (2.1x) via stack-buffer single-pass encoding. The remaining 3 ns gap comes from generic @typeInfo reflection at runtime vs alloy's derive macros.

Root Cause

Current encoding uses comptime reflection to iterate struct fields:

inline for (s.fields) |field| {
    payload_len += encodedLength(@field(value, field.name));
}

While inline for unrolls this at comptime, the encodedLength dispatch still happens at runtime for each field type. Alloy's RlpEncodable derive macro generates a single function body with no dispatch.

Proposed Approach

Generate a specialized encoder for known transaction types:

/// Comptime-generate an RLP encoder for a specific struct type.
pub fn RlpEncoder(comptime T: type) type {
    const fields = @typeInfo(T).@"struct".fields;

    return struct {
        /// Encode directly into a stack buffer. Returns the encoded slice.
        pub fn encode(value: T, buf: []u8) []const u8 {
            var pos: usize = 0;

            // Comptime-unrolled: each field gets its own specialized write
            inline for (fields) |field| {
                const fval = @field(value, field.name);
                pos += writeField(@TypeOf(fval), fval, buf[pos..]);
            }

            return buf[0..pos];
        }

        /// Comptime-computed maximum encoded size for this struct.
        pub const max_encoded_size = comptimeMaxSize(fields);
    };
}

// Usage:
const Eip1559Encoder = RlpEncoder(Eip1559Transaction);
var buf: [Eip1559Encoder.max_encoded_size]u8 = undefined;
const encoded = Eip1559Encoder.encode(tx, &buf);

The key optimization is comptimeMaxSize -- knowing the max buffer size at comptime avoids any allocation or bounds checking.

This Is a Good First Issue Because

  • The existing RLP code in src/encoding/rlp.zig is clean and well-documented
  • The pattern follows existing comptime conventions in the codebase
  • The benchmark already exists to measure improvement
  • Expected gain is small (~3 ns) but the technique is valuable for the codebase

Target

Close the gap from 1.08x loss to tie or slight win.

References

  • src/encoding/rlp.zig -- current implementation (look at writeDirect and serializeTuple)
  • bench/bench.zig:317-334 -- benchmark code
  • alloy RlpEncodable derive -- what we're matching

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions