Skip to content

feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports#64

Open
ilitteri wants to merge 10 commits intomainfrom
feat/snap-profile
Open

feat(l1): add snap-sync offline profiling CLI with verify, compare, and JSON reports#64
ilitteri wants to merge 10 commits intomainfrom
feat/snap-profile

Conversation

@ilitteri
Copy link
Copy Markdown
Collaborator

Motivation

Wraps the offline snap sync profiling from lambdaclass/ethrex#6208 into the ethrex-replay CLI, adding dataset verification, JSON report output, and automated regression comparison.

Description

Phase 1 — Core profiling CLI (commits 83f2fc9..6b38340):

  • snap-sync profile command with --repeat, --warmup, --backend rocksdb|inmemory, --db-dir, --keep-db
  • Fresh DB per run for unbiased timing stats
  • Default backend is rocksdb when feature is compiled in
  • RunStats (median, mean, stddev, p95, p99, min, max) for terminal output
  • Hard failure on state root mismatch

Phase 1.5 — Hardening (commit 4650734):

  • snap-sync verify-dataset — base mode (manifest/dir/naming checks) + strict mode (RLP decode validation)
  • snap-sync compare — loads two JSON reports, computes median/p95 deltas, regression gating with --fail-on-regression
  • SnapProfileReportV1 JSON schema with --json-out/--json-stdout on profile command
  • Fixture dataset at fixtures/snapsync/v1/tiny/ (3 accounts, 2 storage slots)
  • 4 corrupt dataset generators for testing
  • 20 tests across 4 modules (all pass)

Also fixes 9 compile errors from upstream API changes and updates Cargo deps to feat/snap-profile branch.

Depends on: lambdaclass/ethrex#6208

Test plan

  • cargo check --features rocksdb and cargo check — both compile clean
  • cargo test snapsync_ — 20 passed, 1 ignored
  • snap-sync verify-dataset --dataset fixtures/snapsync/v1/tiny --strict — exits 0, reports 3 accounts / 2 storage slots
  • snap-sync profile --dataset fixtures/snapsync/v1/tiny --repeat 2 --warmup 1 --backend rocksdb --json-out /tmp/profile.json — exits 1 (expected: fixture has placeholder root), JSON written with root_validation.matches=false
  • snap-sync compare --baseline /tmp/a.json --candidate /tmp/b.json --regression-threshold-pct 5 --fail-on-regression — exits 1 when regression exceeds threshold

Add a `snapsync profile` CLI subcommand that calls the ethrex-p2p profile
API to benchmark snap sync compute phases (insert-accounts, insert-storages)
against a captured dataset. Includes warmup/repeat controls, phase filtering,
state root consistency checks, and statistical output (median, mean, stddev,
min, max) via a new RunStats utility.

- src/cli.rs: SnapSync variant, SnapSyncSubcommand, SnapSyncProfileOptions
- src/snapsync.rs: run_profile runner
- src/profiling.rs: RunStats for duration statistics
- src/lib.rs: register new modules
- Cargo.toml: point ethrex-p2p to feat/snap-profile branch
- Make expected-root mismatch a hard error (non-zero exit) instead of
  logging MISMATCH
- Remove --phases flag since run_once() always runs both phases; both
  InsertAccounts and InsertStorages stats are always printed
- Add --repeat validation: must be >= 1 (rejects 0)
- Add p95/p99 percentiles to RunStats output
- Point all ethrex deps to feat/snap-profile branch for consistency
…nd into snapsync profiler

- Fix ProgramInput import (guest_program → ethrex_guest_program) and remove
  deleted fields (elasticity_multiplier, fee_configs) from L1 ProgramInput
- Fix RpcExecutionWitness import (now in ethrex_common, no longer public from ethrex_rpc)
- Add missing get_code_metadata impl for RpcDB (new LevmDatabase trait method)
- Add missing slot_number field to BuildPayloadArgs
- Add missing execution_witness_from_rpc_chain_config import in cli.rs
- Enforce --repeat >= 1 via custom value_parser
- Wire run_once_with_opts() into snapsync.rs with --backend, --db-dir, --keep-db CLI flags
- Add rocksdb feature flag to ethrex-replay
- Update all ethrex deps from 79971ba → be3f7910 (feat/snap-profile tip)
Each run now gets a fresh DB directory (temp dir or run-{i} subdir) so timing
stats are independent. Default backend is rocksdb when the feature is compiled
in, matching snap_profile_replay behavior and avoiding OOM on large datasets.
…s, and 20 tests

New commands:
- snap-sync verify-dataset: base mode (manifest/dir/naming) + strict mode (RLP decode)
- snap-sync compare: loads two JSON reports, computes median/p95 deltas, regression gating
- snap-sync profile: now emits SnapProfileReportV1 JSON via --json-out/--json-stdout

New modules:
- snapsync_report.rs: SnapProfileReportV1 schema, PhaseStats, manifest SHA-256
- snapsync_compare.rs: ComparisonReport, PhaseDelta, compatibility checks (7 tests)
- snapsync_verify.rs: VerifyResult, DatasetStats, chunk index validation (8 tests)
- snapsync_fixtures.rs: tiny dataset generator + 4 corrupt variants (5 tests)

Also includes committed tiny fixture at fixtures/snapsync/v1/tiny/ and .gitignore
exception for fixture JSON files.
Copilot AI review requested due to automatic review settings February 16, 2026 02:22
@github-actions github-actions bot added the L1 label Feb 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an offline snap-sync profiling CLI to ethrex-replay, enabling performance analysis and regression testing of snap-sync dataset processing. The implementation introduces three new subcommands under snap-sync: profile for benchmarking, verify-dataset for data integrity checking, and compare for automated regression detection.

Changes:

  • Adds offline snap-sync profiling with configurable repeat/warmup runs, backend selection (rocksdb/inmemory), and JSON report generation
  • Implements dataset verification with base and strict modes (RLP decoding validation)
  • Provides automated comparison tool with regression threshold detection and gating
  • Includes test fixture generators and 20 comprehensive tests across 4 modules
  • Updates dependency branches to feat/snap-profile and fixes 9 compile errors from upstream API changes
  • Reorganizes imports related to RpcExecutionWitness and ProgramInput package renaming

Reviewed changes

Copilot reviewed 13 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/snapsync.rs Main profiling logic with DB isolation, timing stats, and state root validation
src/snapsync_verify.rs Dataset verification with manifest checking, chunk validation, and RLP decoding
src/snapsync_report.rs JSON report schema (SnapProfileReportV1) with statistics calculation
src/snapsync_compare.rs Report comparison with delta computation and regression detection
src/snapsync_fixtures.rs Test fixture generators for valid and corrupt datasets
src/profiling.rs RunStats helper for computing median, mean, stddev, and percentiles
src/cli.rs CLI integration with 3 new subcommands and option parsing
src/lib.rs Module exports for new snap-sync functionality
src/run.rs Import reorganization for ProgramInput and RpcExecutionWitness
src/cache.rs Import update for RpcExecutionWitness location
src/rpc/db.rs Import update and new get_code_metadata method implementation
Cargo.toml Branch updates to feat/snap-profile, adds sha2 and tempfile dependencies, adds rocksdb feature
.gitignore Allows fixtures/**/*.json to be committed
fixtures/snapsync/v1/tiny/* Committed test fixture with 3 accounts and 2 storage slots

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

};

let percentile = |p: f64| -> f64 {
let idx = ((p / 100.0) * (n - 1) as f64).round() as usize;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The percentile calculation uses round() which can produce an index equal to n when the percentage is 100. Although the expression idx.min(n - 1) prevents out-of-bounds access, using round() for percentile calculation is unconventional. The standard approach is to use linear interpolation or use floor() with fractional index handling. Consider changing to floor() or clearly documenting why round() is preferred here.

Suggested change
let idx = ((p / 100.0) * (n - 1) as f64).round() as usize;
let idx = ((p / 100.0) * (n - 1) as f64).floor() as usize;

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +109
let regression_detected = opts
.regression_threshold_pct
.is_some_and(|threshold| deltas.total.median_delta_pct > threshold);
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression detection uses a direct floating-point comparison deltas.total.median_delta_pct > threshold without any epsilon tolerance. While this is acceptable for percentage-based thresholds, be aware that floating-point arithmetic can produce values very close to the threshold (e.g., 4.999999999 vs 5.0). Consider documenting this behavior or adding a small epsilon if exact threshold matching is critical.

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +139
let mut buf = [0u8; 8192];
loop {
let n = file.read(&mut buf)?;
if n == 0 {
break;
}
hasher.update(&buf[..n]);
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compute_manifest_sha256 function uses a manual buffered read loop instead of using std::io::copy or Read::read_to_end. While the manual approach is fine, using std::io::copy(&mut file, &mut hasher) would be more idiomatic and equally efficient since Sha256 implements Write. Consider simplifying to: let mut file = fs::File::open(manifest_path)?; let mut hasher = Sha256::new(); std::io::copy(&mut file, &mut hasher)?;.

Suggested change
let mut buf = [0u8; 8192];
loop {
let n = file.read(&mut buf)?;
if n == 0 {
break;
}
hasher.update(&buf[..n]);
}
std::io::copy(&mut file, &mut hasher)?;

Copilot uses AI. Check for mistakes.
#[arg(long, default_value = "10", value_parser = parse_positive_usize, help = "Number of measured runs (must be >= 1)")]
pub repeat: usize,

#[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")]
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warmup parameter does not have validation to ensure it's not excessively large. A user could accidentally pass --warmup 1000000 which would make the tool run for an extremely long time before any measurements are taken. Consider adding a value_parser similar to repeat to enforce a reasonable maximum (e.g., 100) or at least document expected ranges.

Suggested change
#[arg(long, default_value = "1", help = "Number of warmup runs (not measured)")]
#[arg(long, default_value = "1", value_parser = clap::value_parser!(usize).range(0..=100), help = "Number of warmup runs (not measured, 0-100)")]

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +323
for chunk_path in &acc_chunks {
match std::fs::read(chunk_path) {
Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) {
Ok(accounts) => stats.total_accounts += accounts.len(),
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to decode account RLP: {e}"),
}),
},
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to read file: {e}"),
}),
}
}

for chunk_path in &storage_chunks {
match std::fs::read(chunk_path) {
Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) {
Ok(entries) => {
for (_, slots) in &entries {
stats.total_storage_slots += slots.len();
}
}
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to decode storage RLP: {e}"),
}),
},
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to read file: {e}"),
}),
}
}
}

let valid = errors.is_empty();
VerifyResult {
schema_version: 1,
valid,
strict: opts.strict,
errors,
stats,
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is significant code duplication between run_verify (lines 40-182) and the test helper verify_dataset (lines 231-324). The logic is nearly identical except that run_verify prints to terminal and returns an error on failure, while verify_dataset just returns the result. This duplication makes maintenance harder and increases the risk of bugs if one function is updated but not the other. Consider refactoring to extract the shared validation logic into a helper function that both run_verify and verify_dataset call, with run_verify handling the terminal output and error reporting as a thin wrapper.

Suggested change
for chunk_path in &acc_chunks {
match std::fs::read(chunk_path) {
Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) {
Ok(accounts) => stats.total_accounts += accounts.len(),
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to decode account RLP: {e}"),
}),
},
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to read file: {e}"),
}),
}
}
for chunk_path in &storage_chunks {
match std::fs::read(chunk_path) {
Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) {
Ok(entries) => {
for (_, slots) in &entries {
stats.total_storage_slots += slots.len();
}
}
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to decode storage RLP: {e}"),
}),
},
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to read file: {e}"),
}),
}
}
}
let valid = errors.is_empty();
VerifyResult {
schema_version: 1,
valid,
strict: opts.strict,
errors,
stats,
}
verify_strict_chunks(&acc_chunks, &storage_chunks, &mut stats, &mut errors);
}
let valid = errors.is_empty();
VerifyResult {
schema_version: 1,
valid,
strict: opts.strict,
errors,
stats,
}
}
/// Perform strict verification by decoding account and storage chunks.
fn verify_strict_chunks(
acc_chunks: &[PathBuf],
storage_chunks: &[PathBuf],
stats: &mut VerifyStats,
errors: &mut Vec<VerifyError>,
) {
for chunk_path in acc_chunks {
match std::fs::read(chunk_path) {
Ok(bytes) => match <Vec<(H256, AccountState)>>::decode(&bytes) {
Ok(accounts) => stats.total_accounts += accounts.len(),
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to decode account RLP: {e}"),
}),
},
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to read file: {e}"),
}),
}
}
for chunk_path in storage_chunks {
match std::fs::read(chunk_path) {
Ok(bytes) => match <Vec<(Vec<H256>, Vec<(H256, U256)>)>>::decode(&bytes) {
Ok(entries) => {
for (_, slots) in &entries {
stats.total_storage_slots += slots.len();
}
}
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to decode storage RLP: {e}"),
}),
},
Err(e) => errors.push(VerifyError {
file: chunk_path.display().to_string(),
message: format!("Failed to read file: {e}"),
}),
}
}

Copilot uses AI. Check for mistakes.
pub fn len(&self) -> usize {
self.durations.len()
}

Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RunStats struct has a len() method but no corresponding is_empty() method. This violates the Rust convention that types implementing len() should also implement is_empty() for consistency and clarity. The clippy lint len_without_is_empty would flag this. Add an is_empty() method that returns self.durations.is_empty().

Suggested change
pub fn is_empty(&self) -> bool {
self.durations.is_empty()
}

Copilot uses AI. Check for mistakes.
impl PhaseStats {
pub fn from_durations(durations: &[Duration]) -> Self {
let mut sorted: Vec<f64> = durations.iter().map(|d| d.as_secs_f64()).collect();
sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sort_by with partial_cmp can panic if the input contains NaN values, even though this is unlikely with Duration-derived values. For robustness, consider using unwrap_or with a default comparison or filtering out NaN values before sorting. The same pattern appears in the profiling module at line 9 where durations are sorted without checking for potential issues.

Suggested change
sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));

Copilot uses AI. Check for mistakes.
ilitteri and others added 5 commits February 16, 2026 20:29
…ethrex-p2p

ProfileBackend and run_once_with_opts now live in the snapsync_profile
crate (tooling/snapsync_profile in ethrex). Added snapsync_profile git
dep and wired its rocksdb feature. Manifest types (load_manifest,
SnapProfileManifest, etc.) remain in ethrex_p2p::sync::profile.
The parsing logic now lives in the snapsync_profile crate via a
FromStr impl. This eliminates the duplicated feature-gated match
arms that were identical between ethrex-replay and the tooling
binary.
The merge of main into feat/snap-profile (f6f5946) introduced calls
to print_individual_runs and RunStats::median() in cli.rs, but the
function was missing and median was private. Add the missing function
and bump median visibility.
Move the 5 snapsync_*.rs files into src/snapsync/ as submodules:
  snapsync.rs         -> snapsync/profile.rs
  snapsync_report.rs  -> snapsync/report.rs
  snapsync_compare.rs -> snapsync/compare.rs
  snapsync_verify.rs  -> snapsync/verify.rs
  snapsync_fixtures.rs -> snapsync/fixtures.rs

Update all imports: crate::snapsync_* -> crate::snapsync::* and
super::* for sibling module references.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants