feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports#64
feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports#64
Conversation
Add a `snapsync profile` CLI subcommand that calls the ethrex-p2p profile API to benchmark snap sync compute phases (insert-accounts, insert-storages) against a captured dataset. Includes warmup/repeat controls, phase filtering, state root consistency checks, and statistical output (median, mean, stddev, min, max) via a new RunStats utility. - src/cli.rs: SnapSync variant, SnapSyncSubcommand, SnapSyncProfileOptions - src/snapsync.rs: run_profile runner - src/profiling.rs: RunStats for duration statistics - src/lib.rs: register new modules - Cargo.toml: point ethrex-p2p to feat/snap-profile branch
- Make expected-root mismatch a hard error (non-zero exit) instead of logging MISMATCH - Remove --phases flag since run_once() always runs both phases; both InsertAccounts and InsertStorages stats are always printed - Add --repeat validation: must be >= 1 (rejects 0) - Add p95/p99 percentiles to RunStats output - Point all ethrex deps to feat/snap-profile branch for consistency
…nd into snapsync profiler - Fix ProgramInput import (guest_program → ethrex_guest_program) and remove deleted fields (elasticity_multiplier, fee_configs) from L1 ProgramInput - Fix RpcExecutionWitness import (now in ethrex_common, no longer public from ethrex_rpc) - Add missing get_code_metadata impl for RpcDB (new LevmDatabase trait method) - Add missing slot_number field to BuildPayloadArgs - Add missing execution_witness_from_rpc_chain_config import in cli.rs - Enforce --repeat >= 1 via custom value_parser - Wire run_once_with_opts() into snapsync.rs with --backend, --db-dir, --keep-db CLI flags - Add rocksdb feature flag to ethrex-replay - Update all ethrex deps from 79971ba → be3f7910 (feat/snap-profile tip)
Each run now gets a fresh DB directory (temp dir or run-{i} subdir) so timing
stats are independent. Default backend is rocksdb when the feature is compiled
in, matching snap_profile_replay behavior and avoiding OOM on large datasets.
…s, and 20 tests New commands: - snap-sync verify-dataset: base mode (manifest/dir/naming) + strict mode (RLP decode) - snap-sync compare: loads two JSON reports, computes median/p95 deltas, regression gating - snap-sync profile: now emits SnapProfileReportV1 JSON via --json-out/--json-stdout New modules: - snapsync_report.rs: SnapProfileReportV1 schema, PhaseStats, manifest SHA-256 - snapsync_compare.rs: ComparisonReport, PhaseDelta, compatibility checks (7 tests) - snapsync_verify.rs: VerifyResult, DatasetStats, chunk index validation (8 tests) - snapsync_fixtures.rs: tiny dataset generator + 4 corrupt variants (5 tests) Also includes committed tiny fixture at fixtures/snapsync/v1/tiny/ and .gitignore exception for fixture JSON files.
There was a problem hiding this comment.
Pull request overview
This PR adds an offline snap-sync profiling CLI to ethrex-replay, enabling performance analysis and regression testing of snap-sync dataset processing. The implementation introduces three new subcommands under snap-sync: profile for benchmarking, verify-dataset for data integrity checking, and compare for automated regression detection.
Changes:
- Adds offline snap-sync profiling with configurable repeat/warmup runs, backend selection (rocksdb/inmemory), and JSON report generation
- Implements dataset verification with base and strict modes (RLP decoding validation)
- Provides automated comparison tool with regression threshold detection and gating
- Includes test fixture generators and 20 comprehensive tests across 4 modules
- Updates dependency branches to
feat/snap-profileand fixes 9 compile errors from upstream API changes - Reorganizes imports related to
RpcExecutionWitnessandProgramInputpackage renaming
Reviewed changes
Copilot reviewed 13 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/snapsync.rs | Main profiling logic with DB isolation, timing stats, and state root validation |
| src/snapsync_verify.rs | Dataset verification with manifest checking, chunk validation, and RLP decoding |
| src/snapsync_report.rs | JSON report schema (SnapProfileReportV1) with statistics calculation |
| src/snapsync_compare.rs | Report comparison with delta computation and regression detection |
| src/snapsync_fixtures.rs | Test fixture generators for valid and corrupt datasets |
| src/profiling.rs | RunStats helper for computing median, mean, stddev, and percentiles |
| src/cli.rs | CLI integration with 3 new subcommands and option parsing |
| src/lib.rs | Module exports for new snap-sync functionality |
| src/run.rs | Import reorganization for ProgramInput and RpcExecutionWitness |
| src/cache.rs | Import update for RpcExecutionWitness location |
| src/rpc/db.rs | Import update and new get_code_metadata method implementation |
| Cargo.toml | Branch updates to feat/snap-profile, adds sha2 and tempfile dependencies, adds rocksdb feature |
| .gitignore | Allows fixtures/**/*.json to be committed |
| fixtures/snapsync/v1/tiny/* | Committed test fixture with 3 accounts and 2 storage slots |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }; | ||
|
|
||
| let percentile = |p: f64| -> f64 { | ||
| let idx = ((p / 100.0) * (n - 1) as f64).round() as usize; |
There was a problem hiding this comment.
The percentile calculation uses round() which can produce an index equal to n when the percentage is 100. Although the expression idx.min(n - 1) prevents out-of-bounds access, using round() for percentile calculation is unconventional. The standard approach is to use linear interpolation or use floor() with fractional index handling. Consider changing to floor() or clearly documenting why round() is preferred here.
| let idx = ((p / 100.0) * (n - 1) as f64).round() as usize; | |
| let idx = ((p / 100.0) * (n - 1) as f64).floor() as usize; |
| let regression_detected = opts | ||
| .regression_threshold_pct | ||
| .is_some_and(|threshold| deltas.total.median_delta_pct > threshold); |
There was a problem hiding this comment.
The regression detection uses a direct floating-point comparison deltas.total.median_delta_pct > threshold without any epsilon tolerance. While this is acceptable for percentage-based thresholds, be aware that floating-point arithmetic can produce values very close to the threshold (e.g., 4.999999999 vs 5.0). Consider documenting this behavior or adding a small epsilon if exact threshold matching is critical.
| let mut buf = [0u8; 8192]; | ||
| loop { | ||
| let n = file.read(&mut buf)?; | ||
| if n == 0 { | ||
| break; | ||
| } | ||
| hasher.update(&buf[..n]); | ||
| } |
There was a problem hiding this comment.
The compute_manifest_sha256 function uses a manual buffered read loop instead of using std::io::copy or Read::read_to_end. While the manual approach is fine, using std::io::copy(&mut file, &mut hasher) would be more idiomatic and equally efficient since Sha256 implements Write. Consider simplifying to: let mut file = fs::File::open(manifest_path)?; let mut hasher = Sha256::new(); std::io::copy(&mut file, &mut hasher)?;.
| let mut buf = [0u8; 8192]; | |
| loop { | |
| let n = file.read(&mut buf)?; | |
| if n == 0 { | |
| break; | |
| } | |
| hasher.update(&buf[..n]); | |
| } | |
| std::io::copy(&mut file, &mut hasher)?; |
| #[arg(long, default_value = "10", value_parser = parse_positive_usize, help = "Number of measured runs (must be >= 1)")] | ||
| pub repeat: usize, | ||
|
|
||
| #[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")] |
There was a problem hiding this comment.
The warmup parameter does not have validation to ensure it's not excessively large. A user could accidentally pass --warmup 1000000 which would make the tool run for an extremely long time before any measurements are taken. Consider adding a value_parser similar to repeat to enforce a reasonable maximum (e.g., 100) or at least document expected ranges.
| #[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")] | |
| #[arg(long, default_value = "1", value_parser = clap::value_parser!(usize).range(0..=100), help = "Number of warmup runs (not measured, 0-100)")] |
| for chunk_path in &acc_chunks { | ||
| match std::fs::read(chunk_path) { | ||
| Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) { | ||
| Ok(accounts) => stats.total_accounts += accounts.len(), | ||
| Err(e) => errors.push(VerifyError { | ||
| file: chunk_path.display().to_string(), | ||
| message: format!("Failed to decode account RLP: {e}"), | ||
| }), | ||
| }, | ||
| Err(e) => errors.push(VerifyError { | ||
| file: chunk_path.display().to_string(), | ||
| message: format!("Failed to read file: {e}"), | ||
| }), | ||
| } | ||
| } | ||
|
|
||
| for chunk_path in &storage_chunks { | ||
| match std::fs::read(chunk_path) { | ||
| Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) { | ||
| Ok(entries) => { | ||
| for (_, slots) in &entries { | ||
| stats.total_storage_slots += slots.len(); | ||
| } | ||
| } | ||
| Err(e) => errors.push(VerifyError { | ||
| file: chunk_path.display().to_string(), | ||
| message: format!("Failed to decode storage RLP: {e}"), | ||
| }), | ||
| }, | ||
| Err(e) => errors.push(VerifyError { | ||
| file: chunk_path.display().to_string(), | ||
| message: format!("Failed to read file: {e}"), | ||
| }), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| let valid = errors.is_empty(); | ||
| VerifyResult { | ||
| schema_version: 1, | ||
| valid, | ||
| strict: opts.strict, | ||
| errors, | ||
| stats, | ||
| } |
There was a problem hiding this comment.
There is significant code duplication between run_verify (lines 40-182) and the test helper verify_dataset (lines 231-324). The logic is nearly identical except that run_verify prints to terminal and returns an error on failure, while verify_dataset just returns the result. This duplication makes maintenance harder and increases the risk of bugs if one function is updated but not the other. Consider refactoring to extract the shared validation logic into a helper function that both run_verify and verify_dataset call, with run_verify handling the terminal output and error reporting as a thin wrapper.
| for chunk_path in &acc_chunks { | |
| match std::fs::read(chunk_path) { | |
| Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) { | |
| Ok(accounts) => stats.total_accounts += accounts.len(), | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to decode account RLP: {e}"), | |
| }), | |
| }, | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to read file: {e}"), | |
| }), | |
| } | |
| } | |
| for chunk_path in &storage_chunks { | |
| match std::fs::read(chunk_path) { | |
| Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) { | |
| Ok(entries) => { | |
| for (_, slots) in &entries { | |
| stats.total_storage_slots += slots.len(); | |
| } | |
| } | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to decode storage RLP: {e}"), | |
| }), | |
| }, | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to read file: {e}"), | |
| }), | |
| } | |
| } | |
| } | |
| let valid = errors.is_empty(); | |
| VerifyResult { | |
| schema_version: 1, | |
| valid, | |
| strict: opts.strict, | |
| errors, | |
| stats, | |
| } | |
| verify_strict_chunks(&acc_chunks, &storage_chunks, &mut stats, &mut errors); | |
| } | |
| let valid = errors.is_empty(); | |
| VerifyResult { | |
| schema_version: 1, | |
| valid, | |
| strict: opts.strict, | |
| errors, | |
| stats, | |
| } | |
| } | |
| /// Perform strict verification by decoding account and storage chunks. | |
| fn verify_strict_chunks( | |
| acc_chunks: &[PathBuf], | |
| storage_chunks: &[PathBuf], | |
| stats: &mut VerifyStats, | |
| errors: &mut Vec<VerifyError>, | |
| ) { | |
| for chunk_path in acc_chunks { | |
| match std::fs::read(chunk_path) { | |
| Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) { | |
| Ok(accounts) => stats.total_accounts += accounts.len(), | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to decode account RLP: {e}"), | |
| }), | |
| }, | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to read file: {e}"), | |
| }), | |
| } | |
| } | |
| for chunk_path in storage_chunks { | |
| match std::fs::read(chunk_path) { | |
| Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) { | |
| Ok(entries) => { | |
| for (_, slots) in &entries { | |
| stats.total_storage_slots += slots.len(); | |
| } | |
| } | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to decode storage RLP: {e}"), | |
| }), | |
| }, | |
| Err(e) => errors.push(VerifyError { | |
| file: chunk_path.display().to_string(), | |
| message: format!("Failed to read file: {e}"), | |
| }), | |
| } | |
| } |
| pub fn len(&self) -> usize { | ||
| self.durations.len() | ||
| } | ||
|
|
There was a problem hiding this comment.
The RunStats struct has a len() method but no corresponding is_empty() method. This violates the Rust convention that types implementing len() should also implement is_empty() for consistency and clarity. The clippy lint len_without_is_empty would flag this. Add an is_empty() method that returns self.durations.is_empty().
| pub fn is_empty(&self) -> bool { | |
| self.durations.is_empty() | |
| } |
| impl PhaseStats { | ||
| pub fn from_durations(durations: &[Duration]) -> Self { | ||
| let mut sorted: Vec<f64> = durations.iter().map(|d| d.as_secs_f64()).collect(); | ||
| sorted.sort_by(|a, b| a.partial_cmp(b).unwrap()); |
There was a problem hiding this comment.
The sort_by with partial_cmp can panic if the input contains NaN values, even though this is unlikely with Duration-derived values. For robustness, consider using unwrap_or with a default comparison or filtering out NaN values before sorting. The same pattern appears in the profiling module at line 9 where durations are sorted without checking for potential issues.
| sorted.sort_by(|a, b| a.partial_cmp(b).unwrap()); | |
| sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)); |
…ethrex-p2p ProfileBackend and run_once_with_opts now live in the snapsync_profile crate (tooling/snapsync_profile in ethrex). Added snapsync_profile git dep and wired its rocksdb feature. Manifest types (load_manifest, SnapProfileManifest, etc.) remain in ethrex_p2p::sync::profile.
The parsing logic now lives in the snapsync_profile crate via a FromStr impl. This eliminates the duplicated feature-gated match arms that were identical between ethrex-replay and the tooling binary.
The merge of main into feat/snap-profile (f6f5946) introduced calls to print_individual_runs and RunStats::median() in cli.rs, but the function was missing and median was private. Add the missing function and bump median visibility.
Move the 5 snapsync_*.rs files into src/snapsync/ as submodules: snapsync.rs -> snapsync/profile.rs snapsync_report.rs -> snapsync/report.rs snapsync_compare.rs -> snapsync/compare.rs snapsync_verify.rs -> snapsync/verify.rs snapsync_fixtures.rs -> snapsync/fixtures.rs Update all imports: crate::snapsync_* -> crate::snapsync::* and super::* for sibling module references.
Motivation
Wraps the offline snap sync profiling from lambdaclass/ethrex#6208 into the ethrex-replay CLI, adding dataset verification, JSON report output, and automated regression comparison.
Description
Phase 1 — Core profiling CLI (commits
83f2fc9..6b38340):snap-sync profilecommand with--repeat,--warmup,--backend rocksdb|inmemory,--db-dir,--keep-dbrocksdbwhen feature is compiled inRunStats(median, mean, stddev, p95, p99, min, max) for terminal outputPhase 1.5 — Hardening (commit
4650734):snap-sync verify-dataset— base mode (manifest/dir/naming checks) + strict mode (RLP decode validation)snap-sync compare— loads two JSON reports, computes median/p95 deltas, regression gating with--fail-on-regressionSnapProfileReportV1JSON schema with--json-out/--json-stdouton profile commandfixtures/snapsync/v1/tiny/(3 accounts, 2 storage slots)Also fixes 9 compile errors from upstream API changes and updates Cargo deps to
feat/snap-profilebranch.Depends on: lambdaclass/ethrex#6208
Test plan
cargo check --features rocksdbandcargo check— both compile cleancargo test snapsync_— 20 passed, 1 ignoredsnap-sync verify-dataset --dataset fixtures/snapsync/v1/tiny --strict— exits 0, reports 3 accounts / 2 storage slotssnap-sync profile --dataset fixtures/snapsync/v1/tiny --repeat 2 --warmup 1 --backend rocksdb --json-out /tmp/profile.json— exits 1 (expected: fixture has placeholder root), JSON written withroot_validation.matches=falsesnap-sync compare --baseline /tmp/a.json --candidate /tmp/b.json --regression-threshold-pct 5 --fail-on-regression— exits 1 when regression exceeds threshold