feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports by ilitteri · Pull Request #64 · lambdaclass/ethrex-replay

ilitteri · 2026-02-16T02:22:04Z

Motivation

Wraps the offline snap sync profiling from lambdaclass/ethrex#6208 into the ethrex-replay CLI, adding dataset verification, JSON report output, and automated regression comparison.

Description

Phase 1 — Core profiling CLI (commits 83f2fc9..6b38340):

snap-sync profile command with --repeat, --warmup, --backend rocksdb|inmemory, --db-dir, --keep-db
Fresh DB per run for unbiased timing stats
Default backend is rocksdb when feature is compiled in
RunStats (median, mean, stddev, p95, p99, min, max) for terminal output
Hard failure on state root mismatch

Phase 1.5 — Hardening (commit 4650734):

snap-sync verify-dataset — base mode (manifest/dir/naming checks) + strict mode (RLP decode validation)
snap-sync compare — loads two JSON reports, computes median/p95 deltas, regression gating with --fail-on-regression
SnapProfileReportV1 JSON schema with --json-out/--json-stdout on profile command
Fixture dataset at fixtures/snapsync/v1/tiny/ (3 accounts, 2 storage slots)
4 corrupt dataset generators for testing
20 tests across 4 modules (all pass)

Also fixes 9 compile errors from upstream API changes and updates Cargo deps to feat/snap-profile branch.

Depends on: lambdaclass/ethrex#6208

Test plan

cargo check --features rocksdb and cargo check — both compile clean
cargo test snapsync_ — 20 passed, 1 ignored
snap-sync verify-dataset --dataset fixtures/snapsync/v1/tiny --strict — exits 0, reports 3 accounts / 2 storage slots
snap-sync profile --dataset fixtures/snapsync/v1/tiny --repeat 2 --warmup 1 --backend rocksdb --json-out /tmp/profile.json — exits 1 (expected: fixture has placeholder root), JSON written with root_validation.matches=false
snap-sync compare --baseline /tmp/a.json --candidate /tmp/b.json --regression-threshold-pct 5 --fail-on-regression — exits 1 when regression exceeds threshold

Add a `snapsync profile` CLI subcommand that calls the ethrex-p2p profile API to benchmark snap sync compute phases (insert-accounts, insert-storages) against a captured dataset. Includes warmup/repeat controls, phase filtering, state root consistency checks, and statistical output (median, mean, stddev, min, max) via a new RunStats utility. - src/cli.rs: SnapSync variant, SnapSyncSubcommand, SnapSyncProfileOptions - src/snapsync.rs: run_profile runner - src/profiling.rs: RunStats for duration statistics - src/lib.rs: register new modules - Cargo.toml: point ethrex-p2p to feat/snap-profile branch

- Make expected-root mismatch a hard error (non-zero exit) instead of logging MISMATCH - Remove --phases flag since run_once() always runs both phases; both InsertAccounts and InsertStorages stats are always printed - Add --repeat validation: must be >= 1 (rejects 0) - Add p95/p99 percentiles to RunStats output - Point all ethrex deps to feat/snap-profile branch for consistency

…nd into snapsync profiler - Fix ProgramInput import (guest_program → ethrex_guest_program) and remove deleted fields (elasticity_multiplier, fee_configs) from L1 ProgramInput - Fix RpcExecutionWitness import (now in ethrex_common, no longer public from ethrex_rpc) - Add missing get_code_metadata impl for RpcDB (new LevmDatabase trait method) - Add missing slot_number field to BuildPayloadArgs - Add missing execution_witness_from_rpc_chain_config import in cli.rs - Enforce --repeat >= 1 via custom value_parser - Wire run_once_with_opts() into snapsync.rs with --backend, --db-dir, --keep-db CLI flags - Add rocksdb feature flag to ethrex-replay - Update all ethrex deps from 79971ba → be3f7910 (feat/snap-profile tip)

Each run now gets a fresh DB directory (temp dir or run-{i} subdir) so timing stats are independent. Default backend is rocksdb when the feature is compiled in, matching snap_profile_replay behavior and avoiding OOM on large datasets.

…s, and 20 tests New commands: - snap-sync verify-dataset: base mode (manifest/dir/naming) + strict mode (RLP decode) - snap-sync compare: loads two JSON reports, computes median/p95 deltas, regression gating - snap-sync profile: now emits SnapProfileReportV1 JSON via --json-out/--json-stdout New modules: - snapsync_report.rs: SnapProfileReportV1 schema, PhaseStats, manifest SHA-256 - snapsync_compare.rs: ComparisonReport, PhaseDelta, compatibility checks (7 tests) - snapsync_verify.rs: VerifyResult, DatasetStats, chunk index validation (8 tests) - snapsync_fixtures.rs: tiny dataset generator + 4 corrupt variants (5 tests) Also includes committed tiny fixture at fixtures/snapsync/v1/tiny/ and .gitignore exception for fixture JSON files.

Copilot

Pull request overview

This PR adds an offline snap-sync profiling CLI to ethrex-replay, enabling performance analysis and regression testing of snap-sync dataset processing. The implementation introduces three new subcommands under snap-sync: profile for benchmarking, verify-dataset for data integrity checking, and compare for automated regression detection.

Changes:

Adds offline snap-sync profiling with configurable repeat/warmup runs, backend selection (rocksdb/inmemory), and JSON report generation
Implements dataset verification with base and strict modes (RLP decoding validation)
Provides automated comparison tool with regression threshold detection and gating
Includes test fixture generators and 20 comprehensive tests across 4 modules
Updates dependency branches to feat/snap-profile and fixes 9 compile errors from upstream API changes
Reorganizes imports related to RpcExecutionWitness and ProgramInput package renaming

Reviewed changes

Copilot reviewed 13 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/snapsync.rs	Main profiling logic with DB isolation, timing stats, and state root validation
src/snapsync_verify.rs	Dataset verification with manifest checking, chunk validation, and RLP decoding
src/snapsync_report.rs	JSON report schema (SnapProfileReportV1) with statistics calculation
src/snapsync_compare.rs	Report comparison with delta computation and regression detection
src/snapsync_fixtures.rs	Test fixture generators for valid and corrupt datasets
src/profiling.rs	RunStats helper for computing median, mean, stddev, and percentiles
src/cli.rs	CLI integration with 3 new subcommands and option parsing
src/lib.rs	Module exports for new snap-sync functionality
src/run.rs	Import reorganization for ProgramInput and RpcExecutionWitness
src/cache.rs	Import update for RpcExecutionWitness location
src/rpc/db.rs	Import update and new get_code_metadata method implementation
Cargo.toml	Branch updates to feat/snap-profile, adds sha2 and tempfile dependencies, adds rocksdb feature
.gitignore	Allows fixtures/*/.json to be committed
fixtures/snapsync/v1/tiny/*	Committed test fixture with 3 accounts and 2 storage slots

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T02:29:11Z

src/snapsync/report.rs

+        };
+
+        let percentile = |p: f64| -> f64 {
+            let idx = ((p / 100.0) * (n - 1) as f64).round() as usize;


The percentile calculation uses round() which can produce an index equal to n when the percentage is 100. Although the expression idx.min(n - 1) prevents out-of-bounds access, using round() for percentile calculation is unconventional. The standard approach is to use linear interpolation or use floor() with fractional index handling. Consider changing to floor() or clearly documenting why round() is preferred here.

Suggested change

let idx = ((p / 100.0) * (n - 1) as f64).round() as usize;

let idx = ((p / 100.0) * (n - 1) as f64).floor() as usize;

Copilot · 2026-02-16T02:29:11Z

src/snapsync/compare.rs

+    let regression_detected = opts
+        .regression_threshold_pct
+        .is_some_and(|threshold| deltas.total.median_delta_pct > threshold);


The regression detection uses a direct floating-point comparison deltas.total.median_delta_pct > threshold without any epsilon tolerance. While this is acceptable for percentage-based thresholds, be aware that floating-point arithmetic can produce values very close to the threshold (e.g., 4.999999999 vs 5.0). Consider documenting this behavior or adding a small epsilon if exact threshold matching is critical.

Copilot · 2026-02-16T02:29:11Z

src/snapsync/report.rs

+    let mut buf = [0u8; 8192];
+    loop {
+        let n = file.read(&mut buf)?;
+        if n == 0 {
+            break;
+        }
+        hasher.update(&buf[..n]);
+    }


The compute_manifest_sha256 function uses a manual buffered read loop instead of using std::io::copy or Read::read_to_end. While the manual approach is fine, using std::io::copy(&mut file, &mut hasher) would be more idiomatic and equally efficient since Sha256 implements Write. Consider simplifying to: let mut file = fs::File::open(manifest_path)?; let mut hasher = Sha256::new(); std::io::copy(&mut file, &mut hasher)?;.

Suggested change

let mut buf = [0u8; 8192];

loop {

let n = file.read(&mut buf)?;

if n == 0 {

break;

}

hasher.update(&buf[..n]);

}

std::io::copy(&mut file, &mut hasher)?;

Copilot · 2026-02-16T02:29:12Z

src/cli.rs

+    #[arg(long, default_value = "10", value_parser = parse_positive_usize, help = "Number of measured runs (must be >= 1)")]
+    pub repeat: usize,
+
+    #[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")]


The warmup parameter does not have validation to ensure it's not excessively large. A user could accidentally pass --warmup 1000000 which would make the tool run for an extremely long time before any measurements are taken. Consider adding a value_parser similar to repeat to enforce a reasonable maximum (e.g., 100) or at least document expected ranges.

Suggested change

#[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")]

#[arg(long, default_value = "1", value_parser = clap::value_parser!(usize).range(0..=100), help = "Number of warmup runs (not measured, 0-100)")]

Copilot · 2026-02-16T02:29:12Z

src/snapsync/verify.rs

+        for chunk_path in &acc_chunks {
+            match std::fs::read(chunk_path) {
+                Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) {
+                    Ok(accounts) => stats.total_accounts += accounts.len(),
+                    Err(e) => errors.push(VerifyError {
+                        file: chunk_path.display().to_string(),
+                        message: format!("Failed to decode account RLP: {e}"),
+                    }),
+                },
+                Err(e) => errors.push(VerifyError {
+                    file: chunk_path.display().to_string(),
+                    message: format!("Failed to read file: {e}"),
+                }),
+            }
+        }
+
+        for chunk_path in &storage_chunks {
+            match std::fs::read(chunk_path) {
+                Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) {
+                    Ok(entries) => {
+                        for (_, slots) in &entries {
+                            stats.total_storage_slots += slots.len();
+                        }
+                    }
+                    Err(e) => errors.push(VerifyError {
+                        file: chunk_path.display().to_string(),
+                        message: format!("Failed to decode storage RLP: {e}"),
+                    }),
+                },
+                Err(e) => errors.push(VerifyError {
+                    file: chunk_path.display().to_string(),
+                    message: format!("Failed to read file: {e}"),
+                }),
+            }
+        }
+    }
+
+    let valid = errors.is_empty();
+    VerifyResult {
+        schema_version: 1,
+        valid,
+        strict: opts.strict,
+        errors,
+        stats,
+    }


There is significant code duplication between run_verify (lines 40-182) and the test helper verify_dataset (lines 231-324). The logic is nearly identical except that run_verify prints to terminal and returns an error on failure, while verify_dataset just returns the result. This duplication makes maintenance harder and increases the risk of bugs if one function is updated but not the other. Consider refactoring to extract the shared validation logic into a helper function that both run_verify and verify_dataset call, with run_verify handling the terminal output and error reporting as a thin wrapper.

Suggested change

for chunk_path in &acc_chunks {

match std::fs::read(chunk_path) {

Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) {

Ok(accounts) => stats.total_accounts += accounts.len(),

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to decode account RLP: {e}"),

}),

},

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to read file: {e}"),

}),

}

}

for chunk_path in &storage_chunks {

match std::fs::read(chunk_path) {

Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) {

Ok(entries) => {

for (_, slots) in &entries {

stats.total_storage_slots += slots.len();

}

}

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to decode storage RLP: {e}"),

}),

},

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to read file: {e}"),

}),

}

}

}

let valid = errors.is_empty();

VerifyResult {

schema_version: 1,

valid,

strict: opts.strict,

errors,

stats,

}

verify_strict_chunks(&acc_chunks, &storage_chunks, &mut stats, &mut errors);

}

let valid = errors.is_empty();

VerifyResult {

schema_version: 1,

valid,

strict: opts.strict,

errors,

stats,

}

}

/// Perform strict verification by decoding account and storage chunks.

fn verify_strict_chunks(

acc_chunks: &[PathBuf],

storage_chunks: &[PathBuf],

stats: &mut VerifyStats,

errors: &mut Vec<VerifyError>,

) {

for chunk_path in acc_chunks {

match std::fs::read(chunk_path) {

Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) {

Ok(accounts) => stats.total_accounts += accounts.len(),

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to decode account RLP: {e}"),

}),

},

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to read file: {e}"),

}),

}

}

for chunk_path in storage_chunks {

match std::fs::read(chunk_path) {

Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) {

Ok(entries) => {

for (_, slots) in &entries {

stats.total_storage_slots += slots.len();

}

}

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to decode storage RLP: {e}"),

}),

},

Err(e) => errors.push(VerifyError {

file: chunk_path.display().to_string(),

message: format!("Failed to read file: {e}"),

}),

}

}

Copilot · 2026-02-16T02:29:12Z

src/profiling.rs

+    pub fn len(&self) -> usize {
+        self.durations.len()
+    }
+


The RunStats struct has a len() method but no corresponding is_empty() method. This violates the Rust convention that types implementing len() should also implement is_empty() for consistency and clarity. The clippy lint len_without_is_empty would flag this. Add an is_empty() method that returns self.durations.is_empty().

Suggested change

pub fn is_empty(&self) -> bool {

self.durations.is_empty()

}

Copilot · 2026-02-16T02:29:12Z

src/snapsync/report.rs

+impl PhaseStats {
+    pub fn from_durations(durations: &[Duration]) -> Self {
+        let mut sorted: Vec<f64> = durations.iter().map(|d| d.as_secs_f64()).collect();
+        sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());


The sort_by with partial_cmp can panic if the input contains NaN values, even though this is unlikely with Duration-derived values. For robustness, consider using unwrap_or with a default comparison or filtering out NaN values before sorting. The same pattern appears in the profiling module at line 9 where durations are sorted without checking for potential issues.

Suggested change

sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());

sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));

…ethrex-p2p ProfileBackend and run_once_with_opts now live in the snapsync_profile crate (tooling/snapsync_profile in ethrex). Added snapsync_profile git dep and wired its rocksdb feature. Manifest types (load_manifest, SnapProfileManifest, etc.) remain in ethrex_p2p::sync::profile.

The parsing logic now lives in the snapsync_profile crate via a FromStr impl. This eliminates the duplicated feature-gated match arms that were identical between ethrex-replay and the tooling binary.

The merge of main into feat/snap-profile (f6f5946) introduced calls to print_individual_runs and RunStats::median() in cli.rs, but the function was missing and median was private. Add the missing function and bump median visibility.

Move the 5 snapsync_*.rs files into src/snapsync/ as submodules: snapsync.rs -> snapsync/profile.rs snapsync_report.rs -> snapsync/report.rs snapsync_compare.rs -> snapsync/compare.rs snapsync_verify.rs -> snapsync/verify.rs snapsync_fixtures.rs -> snapsync/fixtures.rs Update all imports: crate::snapsync_* -> crate::snapsync::* and super::* for sibling module references.

ilitteri added 5 commits February 13, 2026 14:33

Copilot AI review requested due to automatic review settings February 16, 2026 02:22

github-actions bot added the L1 label Feb 16, 2026

github-actions bot assigned ilitteri Feb 16, 2026

Copilot started reviewing on behalf of ilitteri February 16, 2026 02:22 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

ilitteri and others added 5 commits February 16, 2026 20:29

Merge branch 'main' into feat/snap-profile

f6f5946

Simplify parse_backend to delegate to ProfileBackend::FromStr

03666b8

The parsing logic now lives in the snapsync_profile crate via a FromStr impl. This eliminates the duplicated feature-gated match arms that were identical between ethrex-replay and the tooling binary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports#64

feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports#64
ilitteri wants to merge 10 commits intomainfrom
feat/snap-profile

ilitteri commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	let idx = ((p / 100.0) * (n - 1) as f64).round() as usize;
	let idx = ((p / 100.0) * (n - 1) as f64).floor() as usize;

	#[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")]
	#[arg(long, default_value = "1", value_parser = clap::value_parser!(usize).range(0..=100), help = "Number of warmup runs (not measured, 0-100)")]

+    pub fn is_empty(&self) -> bool {
+        self.durations.is_empty()
+    }

	sorted.sort_by(\|a, b\| a.partial_cmp(b).unwrap());
	sorted.sort_by(\|a, b\| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));

Conversation

ilitteri commented Feb 16, 2026

Motivation

Description

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants