test(minibf): introduce endpoint benchmark via xtask by scarmuega · Pull Request #889 · txpipe/dolos

scarmuega · 2026-02-09T23:48:47Z

Summary by CodeRabbit

New Features
- Added end-to-end benchmarking tool for minibf with support for concurrent request execution, latency analysis, pagination testing, and detailed report generation (JSON/Table formats).
- Added preprod benchmark dataset for testing scenarios.
Chores
- Added cargo alias for streamlined command execution.
- Internal code formatting refinements.

coderabbitai · 2026-02-09T23:49:02Z

📝 Walkthrough

Walkthrough

This PR introduces minibf_bench, a comprehensive HTTP benchmarking tool integrated into the xtask CLI. The tool loads test vectors from DBSync, generates HTTP requests against minibf endpoints, executes concurrent benchmarks with warmup phases, collects latency statistics, and generates detailed JSON reports with pagination analysis.

Changes

Cohort / File(s)	Summary
Cargo Configuration `.cargo/config.toml`, `xtask/Cargo.toml`	Added xtask cargo alias and new dependencies (hdrhistogram, chrono, reqwest, tokio) for benchmarking infrastructure.
Test Vector Data `src/minibf_bench/vectors/preprod.json`	Added preprod network benchmark dataset with address vectors, stake address vectors, and reference block metadata (429 lines).
Minibf Benchmark Framework `xtask/src/minibf_bench.rs`, `xtask/src/minibf_bench/args.rs`, `xtask/src/minibf_bench/runner.rs`, `xtask/src/minibf_bench/sampler.rs`, `xtask/src/minibf_bench/stats.rs`, `xtask/src/minibf_bench/vectors.rs`, `xtask/src/minibf_bench/report.rs`	Comprehensive benchmarking suite: orchestration module with request generation, CLI argument parsing with Clap, HTTP request execution with async concurrency control, historical chain sampling via API, statistical analysis with HDR histograms, test vector management with DBSync integration and file caching, and JSON report generation with pagination analysis.
CLI Integration `xtask/src/main.rs`	Added MinibfBench subcommand to Commands enum and wired execution path to minibf_bench module.
Ground Truth Module Formatting `xtask/src/ground_truth/delegation.rs`, `xtask/src/ground_truth/epochs.rs`, `xtask/src/ground_truth/eras.rs`, `xtask/src/ground_truth/query.rs`, `xtask/src/ground_truth/rewards.rs`, `xtask/src/ground_truth/stake.rs`	Minor formatting refinements: line wrapping and macro invocation restructuring with no semantic changes.

Sequence Diagram

sequenceDiagram
    participant User
    participant xtask as xtask CLI
    participant VecMgr as Vector Manager
    participant DBSync
    participant Sampler
    participant ReqGen as Request Generator
    participant Runner as Benchmark Runner
    participant Endpoint as minibf Endpoints
    participant Stats as Stats Calculator
    participant Reporter as Report Generator

    User->>xtask: minibf-bench command
    xtask->>VecMgr: load_vectors()
    alt Cache hit & valid
        VecMgr-->>xtask: cached TestVectors
    else Cache miss or expired
        VecMgr->>DBSync: query addresses & stakes
        DBSync-->>VecMgr: vectors data
        VecMgr->>VecMgr: cache to preprod.json
        VecMgr-->>xtask: TestVectors
    end
    
    xtask->>Sampler: fetch historical samples
    Sampler->>Endpoint: GET /blocks/{height}
    Endpoint-->>Sampler: block data
    Sampler-->>xtask: HistoricalSamples
    
    xtask->>ReqGen: generate_requests(vectors, samples)
    ReqGen-->>xtask: Vec<TestRequest>
    
    xtask->>Runner: run_warmup(requests)
    Runner->>Endpoint: concurrent HTTP requests
    Endpoint-->>Runner: responses
    Runner-->>xtask: warmup complete
    
    xtask->>Runner: run_benchmark(requests)
    par Concurrent Execution
        Runner->>Endpoint: request 1
        Endpoint-->>Runner: response + latency
        Runner->>Endpoint: request 2
        Endpoint-->>Runner: response + latency
        Runner->>Endpoint: request N
        Endpoint-->>Runner: response + latency
    end
    Runner-->>xtask: Vec<RequestResult>, Histogram
    
    xtask->>Stats: calculate_endpoint_stats(results)
    Stats->>Stats: aggregate by endpoint, compute percentiles
    Stats-->>xtask: Vec<EndpointStats>
    
    xtask->>Reporter: generate_report(stats, vectors)
    Reporter->>Reporter: assemble metadata, summary, pagination analysis
    Reporter-->>xtask: BenchmarkReport
    
    xtask->>Reporter: write_report(output_path)
    Reporter->>Reporter: serialize to JSON
    Reporter-->>User: report written

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: remove legacy test-vectors crate #797: Introduces VectorGenerator querying DBSync and cached preprod vectors file—directly overlaps the new vectors module functionality for test vector generation and caching.
test: add comprehensive flow for cardano integration tests #852: Modifies xtask CLI by adding new subcommands and updating Commands enum, plus updates xtask/Cargo.toml—shares the same CLI integration pattern and dependency management scope.

Poem

🐰 Benchmarks bloom where data flows so free,
With vectors cached and endpoints spree,
Histograms dance in concurrent delight,
Reports emerge from tests run through the night! 📊✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introducing a comprehensive endpoint benchmarking tool for minibf via xtask, which is reflected across multiple new modules and configurations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dolos-minibf-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

🤖 Fix all issues with AI agents

In `@src/minibf_bench/vectors/preprod.json`:
- Around line 409-428: The reference block hashes in the generated vectors JSON
contain PostgreSQL bytea hex escapes (e.g., "\\x..."); update the code that
populates these hashes—specifically the get_block_at_height function (or the
code path that writes reference_blocks into the vectors JSON/sampler output)—to
normalize hashes by stripping the "\\x" prefix (e.g., call .strip_prefix("\\x")
or equivalent) before storing, or omit the hash field entirely if unused; ensure
the change is applied where reference_blocks are serialized so generated
preprod.json no longer contains Postgres-specific artifacts.

In `@xtask/src/minibf_bench.rs`:
- Around line 26-27: The benchmark start time is being captured too early
(start_time = Instant::now()) so duration includes config loading, vector
generation and sampling; move the Instant::now() call so timing begins
immediately before the actual benchmark run (e.g., just before calling
run_warmup or immediately after warmup finishes depending on whether warmup
should be excluded) and ensure the elapsed/duration calculation uses that moved
start_time variable; update any references to start_time and the duration
computation to use the new placement so reported RPS reflects only the measured
benchmark phase.
- Line 20: The import list includes an unused symbol "Network" causing a clippy
warning; remove "Network" from the use statement so it only imports
load_xtask_config (i.e., change the use of crate::config::{load_xtask_config,
Network} to only import load_xtask_config) and re-run cargo clippy --workspace
--all-targets --all-features to confirm the warning is resolved.
- Around line 47-52: The sampling client created for
HistoricalSamples::from_vectors currently uses reqwest::Client::new() without a
timeout and can hang; change the client construction in the rt.block_on block to
build a reqwest::Client with a request timeout (e.g., 30s) or reuse the same
configuration as BenchmarkRunner (use
reqwest::Client::builder().timeout(...).build()). Pass that timed client to
sampler::HistoricalSamples::from_vectors so sampling will error fast instead of
blocking indefinitely.

In `@xtask/src/minibf_bench/args.rs`:
- Around line 46-48: The CLI arg page_sizes is declared in args.rs but never
used by generate_requests in minibf_bench.rs which still hardcodes counts;
either remove page_sizes or wire it into request generation: change the args
struct to use pub page_sizes: Vec<usize> with #[arg(long, value_delimiter =
',')] (or parse the existing String into usize vec), then update
generate_requests (and any call sites) to iterate over args.page_sizes instead
of hardcoded 20/100 so the generated query strings use the configured page
sizes; ensure parsing errors are handled or validated before request creation.

In `@xtask/src/minibf_bench/report.rs`:
- Line 84: The computed rps = total_requests as f64 / duration.as_secs_f64() can
produce Infinity/NaN when duration is zero, causing serde_json to fail; update
the logic around rps (the variable computed in the report generation and
consumed by write_report) to guard against zero or non-finite results by
checking duration.as_secs_f64() and/or using f64::is_finite on the computed rps
and replacing any non-finite value with 0.0 (or another defined sentinel) before
serializing; ensure the change is applied where rps is computed so write_report
only ever sees finite floats.
- Around line 87-102: The code incorrectly averages per-endpoint percentiles
(endpoint_stats -> latency_p50_micros / p95 / p99) which is statistically
invalid; fix by computing true overall percentiles either by (preferred)
collecting all raw latency samples into one vector and computing percentiles
from that, or (if raw samples/histograms are unavailable) merging per-endpoint
histograms, or at minimum computing a request-count-weighted average using each
endpoint's request count (e.g., use endpoint_stats[*].request_count as weights)
when computing overall_p50/overall_p95/overall_p99; update the computation that
assigns overall_p50/overall_p95/overall_p99 to use one of these approaches and
reference endpoint_stats and the per-endpoint fields (latency_p50_micros,
latency_p95_micros, latency_p99_micros, and the endpoint request count or
histogram field) when making the change.

In `@xtask/src/minibf_bench/runner.rs`:
- Line 9: Remove the unused import `JoinHandle` from the `use
tokio::task::JoinHandle;` line in runner.rs to satisfy clippy; locate the `use`
statement that references `JoinHandle` (symbol: JoinHandle) and delete it (or
remove `JoinHandle` from the import list) so the file no longer imports an
unused symbol.
- Around line 45-68: run_warmup currently fire-and-forgets spawned tasks so the
function returns before warmup requests complete; fix it by waiting for all
tasks to finish — after spawning the tasks in run_warmup (where you create
semaphore, call acquire_owned, spawn with tokio::spawn and drop(permit)), drain
the semaphore or otherwise await completion before returning (e.g., acquire all
permits from the same Arc<Semaphore> or collect and await JoinHandles) so that
every permit is returned (ensuring all spawned requests have finished) prior to
returning Ok(()).

In `@xtask/src/minibf_bench/sampler.rs`:
- Around line 39-59: fetch_sample currently parses JSON without validating HTTP
status which lets 4xx/5xx responses silently produce default values; update
fetch_sample to call error_for_status() (or check resp.status().is_success()) on
both the initial block request and the txs request before calling .json(), so
any non-success HTTP response returns an Err and surfaces sampling failures
immediately (apply to the client.get(...).send().await? flow that binds resp and
txs_resp, and propagate or map the error instead of letting block_hash/txs
default).

In `@xtask/src/minibf_bench/stats.rs`:
- Around line 48-53: histogram.record() can return Err when given 0 because
Histogram::new_with_bounds(1, 60_000_000, 3) sets a lower bound of 1; when
iterating results and calling histogram.record(result.latency_micros)? you
should clamp or map zero latencies to the histogram's minimum (e.g., let v =
result.latency_micros.max(1); histogram.record(v)?), or skip recording zeros
explicitly, so update the loop that references result.latency_micros and the
histogram.record call to ensure no 0 is passed.

In `@xtask/src/minibf_bench/vectors.rs`:
- Around line 199-211: The cache-check currently calls
SystemTime::now().duration_since(modified)? which will return Err for future
mtimes and abort load_vectors; change the logic around vectors_path so the
duration_since result is handled non-panically (e.g., match or if let Ok(age) =
SystemTime::now().duration_since(modified) { if age <
Duration::from_secs(CACHE_TTL_HOURS * 3600) { read and return cached JSON } } ),
treating Err (future modification time) as cache-expired and allowing the
function to continue to regenerate vectors; update the code that references
vectors_path, modified, and CACHE_TTL_HOURS accordingly.

🧹 Nitpick comments (5)

xtask/src/minibf_bench/vectors.rs (4)
82-120: Potentially expensive query on tx_out — consider whether an index exists on tx_out.address.

The LIKE 'addr1%' / LIKE 'addr_test1%' pattern with a leading prefix is index-friendly, so it should be fine if a B-tree index exists on tx_out.address. However, tx_out is one of the largest tables in DBSync, and joining with tx plus the GROUP BY can be very slow on a production-sized database. Since this is for benchmarking/xtask and cached with a 24h TTL, the impact is limited.

Also, the address_type categorization (lines 107-111) only distinguishes "shelley_payment_stake" vs "shelley_payment_only" — Byron addresses are excluded by the LIKE filter, which seems intentional.

147-170: Type inconsistency between tip_height: i32 and BlockRef.height: i64.

tip_height is retrieved as i32 (line 158), and the arithmetic on lines 161-163 is done in i32, but BlockRef.height is i64. This is fine for the as i64 cast in get_block_at_height (line 185), but the intermediate calculations use i32 arithmetic. While tip_height * 9 won't overflow for realistic Cardano block heights (~11M × 9 = ~99M, well within i32::MAX of ~2.1B), consider using i64 consistently throughout for clarity and future-proofing, since BlockRef already uses i64.
♻️ Optional: use i64 consistently
-        let tip_height: i32 = tip_row.get(0);
+        let tip_height: i64 = tip_row.get::<_, i32>(0) as i64;
 
         // Calculate positions for 10%, 50%, 90%
-        let early_height = tip_height / 10;
-        let mid_height = tip_height / 2;
-        let recent_height = (tip_height * 9) / 10;
+        let early_height = (tip_height / 10) as i32;
+        let mid_height = (tip_height / 2) as i32;
+        let recent_height = ((tip_height * 9) / 10) as i32;
228-234: unwrap() on vectors_path.parent() — prefer explicit error handling.

While parent() won't realistically return None here (the path always has a parent directory), using unwrap() is a clippy-unfriendly pattern for production code. As per coding guidelines, clippy warnings should be resolved.
♻️ Suggested fix
-    fs::create_dir_all(&vectors_path.parent().unwrap())?;
+    let parent = vectors_path
+        .parent()
+        .context("vectors path has no parent directory")?;
+    fs::create_dir_all(parent)?;
220-225: Rough URL parsing to extract dbsync_host — consider using the url crate.

The split-based approach works for typical PostgreSQL connection strings, but it can break for URLs with unusual formats (e.g., no @, query parameters with @, etc.). For a benchmarking tool this is acceptable, and the unwrap_or("unknown") fallback is reasonable. Just flagging for awareness.
xtask/src/minibf_bench/report.rs (1)

151-181: Pagination error attribution is overly broad.

Line 168 assumes any success_rate < 1.0 on a paginated endpoint means "scan_limit_exceeded". In practice, failures could be network timeouts, 5xx errors, or other transient issues. This could mislead CI consumers into thinking there's a scan-limit problem when there isn't one.

Consider either inspecting actual error messages/status codes from the stats, or using a more neutral label like "partial_failures".

coderabbitai · 2026-02-09T23:54:05Z

src/minibf_bench/vectors/preprod.json

+  "reference_blocks": {
+    "early": {
+      "height": 440647,
+      "hash": "\\xdd811d450847ac086901e6736a85509c2026f2f66ae0c4335ff85c7f2df7d66e",
+      "epoch": 41,
+      "slot": 16281734
+    },
+    "mid": {
+      "height": 2203237,
+      "hash": "\\x2944dd290435227f96340addb4b616a75035a55fc8119e7752fe4c05642d7889",
+      "epoch": 140,
+      "slot": 59122449
+    },
+    "recent": {
+      "height": 3965826,
+      "hash": "\\xdfc06de3d85ad3c8c7f223b4304a67a50377d1de7eb29b136f3e128f07c89778",
+      "epoch": 243,
+      "slot": 103708196
+    }
+  }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how reference_blocks hashes are used across the codebase rg -n "reference_blocks" --type=rs -C3 echo "---" # Check what VECTORS_DIR points to, to verify this file's location is correct rg -n "VECTORS_DIR" --type=rs -C2

Repository: txpipe/dolos

Length of output: 130

🏁 Script executed:

#!/bin/bash # Check how reference_blocks hashes are used across the codebase rg -n "reference_blocks" --type=rust -C3 echo "---" # Check what VECTORS_DIR points to, to verify this file's location is correct rg -n "VECTORS_DIR" --type=rust -C2

Repository: txpipe/dolos

Length of output: 2690

🏁 Script executed:

#!/bin/bash # Find the sampler module and check how reference_blocks is used rg -n "from_vectors" --type=rust -A10 -B2 echo "---" # Also look for ReferenceBlocks struct definition rg -n "struct ReferenceBlocks" --type=rust -A10

Repository: txpipe/dolos

Length of output: 2573

🏁 Script executed:

#!/bin/bash # Search for any other usage of the hash field from reference_blocks or BlockRef rg -n "\.hash" xtask/src/minibf_bench/ --type=rust -B2 -A2

Repository: txpipe/dolos

Length of output: 38

🏁 Script executed:

#!/bin/bash # Look at the query_reference_blocks method to see how hashes are fetched from the database rg -n "query_reference_blocks" --type=rust -A30

Repository: txpipe/dolos

Length of output: 4570

🏁 Script executed:

#!/bin/bash # Get the rest of get_block_at_height method rg -n "get_block_at_height" --type=rust -A15

Repository: txpipe/dolos

Length of output: 1589

🏁 Script executed:

#!/bin/bash # Let's see what the actual hash looks like in the JSON file cat -n src/minibf_bench/vectors/preprod.json | grep -A5 "reference_blocks"

Repository: txpipe/dolos

Length of output: 287

Remove or normalize block hashes before storing in vectors JSON — PostgreSQL artifacts shouldn't leak into test data.

The \\x prefix on reference block hashes (e.g., \\xdd811d...) is a PostgreSQL bytea hex escape that appears because get_block_at_height queries the database directly without stripping the format. While the sampler currently uses only block heights for API calls (not hashes), storing PostgreSQL-formatted data in JSON is a data quality issue. Consider calling .strip_prefix("\\x") when populating the hash field, or exclude hashes entirely if they're not needed.

🤖 Prompt for AI Agents

In `@src/minibf_bench/vectors/preprod.json` around lines 409 - 428, The reference block hashes in the generated vectors JSON contain PostgreSQL bytea hex escapes (e.g., "\\x..."); update the code that populates these hashes—specifically the get_block_at_height function (or the code path that writes reference_blocks into the vectors JSON/sampler output)—to normalize hashes by stripping the "\\x" prefix (e.g., call .strip_prefix("\\x") or equivalent) before storing, or omit the hash field entirely if unused; ensure the change is applied where reference_blocks are serialized so generated preprod.json no longer contains Postgres-specific artifacts.

coderabbitai · 2026-02-09T23:54:05Z

xtask/src/minibf_bench.rs

+use reqwest::Method;
+use xshell::Shell;
+
+use crate::config::{load_xtask_config, Network};


⚠️ Potential issue | 🟡 Minor

Unused import flagged by static analysis.

Network on line 20 is unused. Remove it to satisfy cargo clippy.

Fix

-use crate::config::{load_xtask_config, Network}; +use crate::config::load_xtask_config;

As per coding guidelines, **/*.rs: "Run cargo clippy --workspace --all-targets --all-features and resolve all clippy warnings before committing changes".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

use crate::config::{load_xtask_config, Network};

use crate::config::load_xtask_config;

🧰 Tools

🪛 GitHub Check: Check Build

[warning] 20-20:
unused import: Network

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench.rs` at line 20, The import list includes an unused symbol "Network" causing a clippy warning; remove "Network" from the use statement so it only imports load_xtask_config (i.e., change the use of crate::config::{load_xtask_config, Network} to only import load_xtask_config) and re-run cargo clippy --workspace --all-targets --all-features to confirm the warning is resolved.

coderabbitai · 2026-02-09T23:54:05Z

xtask/src/minibf_bench.rs

+    let start_time = Instant::now();
+    let repo_root = std::env::current_dir().context("detecting repo root")?;


⚠️ Potential issue | 🟡 Minor

duration includes vector loading, sampling, and request generation — inflating the report.

start_time is captured at line 26 before config loading, vector generation, and chain sampling. The elapsed time at line 79 therefore overstates the actual benchmark duration, deflating the reported RPS.

Consider capturing the start time just before run_warmup (or after warmup, depending on intent) for a more accurate benchmark duration.

Also applies to: 78-80

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench.rs` around lines 26 - 27, The benchmark start time is being captured too early (start_time = Instant::now()) so duration includes config loading, vector generation and sampling; move the Instant::now() call so timing begins immediately before the actual benchmark run (e.g., just before calling run_warmup or immediately after warmup finishes depending on whether warmup should be excluded) and ensure the elapsed/duration calculation uses that moved start_time variable; update any references to start_time and the duration computation to use the new placement so reported RPS reflects only the measured benchmark phase.

coderabbitai · 2026-02-09T23:54:05Z

xtask/src/minibf_bench.rs

+    let rt = tokio::runtime::Runtime::new()?;
+    let samples = rt.block_on(async {
+        let client = reqwest::Client::new();
+        sampler::HistoricalSamples::from_vectors(&client, &args.url, &vectors.reference_blocks)
+            .await
+    })?;


⚠️ Potential issue | 🟡 Minor

Sampling client has no timeout — may hang indefinitely.

The reqwest::Client::new() used for sampling (line 49) has no timeout, unlike the BenchmarkRunner client which uses 30s. If the target server is unreachable, this will block forever.

Proposed fix

- let client = reqwest::Client::new(); + let client = reqwest::Client::builder() + .timeout(std::time::Duration::from_secs(30)) + .build()?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let rt = tokio::runtime::Runtime::new()?;

let samples = rt.block_on(async {

let client = reqwest::Client::new();

sampler::HistoricalSamples::from_vectors(&client, &args.url, &vectors.reference_blocks)

.await

})?;

let rt = tokio::runtime::Runtime::new()?;

let samples = rt.block_on(async {

let client = reqwest::Client::builder()

.timeout(std::time::Duration::from_secs(30))

.build()?;

sampler::HistoricalSamples::from_vectors(&client, &args.url, &vectors.reference_blocks)

.await

})?;

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench.rs` around lines 47 - 52, The sampling client created for HistoricalSamples::from_vectors currently uses reqwest::Client::new() without a timeout and can hang; change the client construction in the rt.block_on block to build a reqwest::Client with a request timeout (e.g., 30s) or reuse the same configuration as BenchmarkRunner (use reqwest::Client::builder().timeout(...).build()). Pass that timed client to sampler::HistoricalSamples::from_vectors so sampling will error fast instead of blocking indefinitely.

coderabbitai · 2026-02-09T23:54:05Z

xtask/src/minibf_bench/args.rs

+    /// Page sizes to test (comma-separated)
+    #[arg(long, default_value = "20,50,100")]
+    pub page_sizes: String,


⚠️ Potential issue | 🟡 Minor

page_sizes is declared but never consumed in request generation.

generate_requests in minibf_bench.rs hardcodes count=20 and count=100 in query strings rather than reading args.page_sizes. Either wire page_sizes into the request generation or remove the argument to avoid misleading users.

Also, consider using Vec<usize> with #[arg(value_delimiter = ',')] instead of a raw String for type-safe parsing.

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench/args.rs` around lines 46 - 48, The CLI arg page_sizes is declared in args.rs but never used by generate_requests in minibf_bench.rs which still hardcodes counts; either remove page_sizes or wire it into request generation: change the args struct to use pub page_sizes: Vec<usize> with #[arg(long, value_delimiter = ',')] (or parse the existing String into usize vec), then update generate_requests (and any call sites) to iterate over args.page_sizes instead of hardcoded 20/100 so the generated query strings use the configured page sizes; ensure parsing errors are handled or validated before request creation.

coderabbitai · 2026-02-09T23:54:06Z

xtask/src/minibf_bench/runner.rs

+use std::sync::Arc;
+use std::time::{Duration, Instant};
+use tokio::sync::{mpsc, Semaphore};
+use tokio::task::JoinHandle;


⚠️ Potential issue | 🟡 Minor

Unused import: JoinHandle.

Flagged by static analysis. Remove to satisfy clippy.

Fix

-use tokio::task::JoinHandle;

As per coding guidelines, **/*.rs: "Run cargo clippy --workspace --all-targets --all-features and resolve all clippy warnings before committing changes".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

use tokio::task::JoinHandle;

🧰 Tools

🪛 GitHub Check: Check Build

[warning] 9-9:
unused import: tokio::task::JoinHandle

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench/runner.rs` at line 9, Remove the unused import `JoinHandle` from the `use tokio::task::JoinHandle;` line in runner.rs to satisfy clippy; locate the `use` statement that references `JoinHandle` (symbol: JoinHandle) and delete it (or remove `JoinHandle` from the import list) so the file no longer imports an unused symbol.

coderabbitai · 2026-02-09T23:54:06Z

xtask/src/minibf_bench/runner.rs

+    pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {
+        println!("Running {} warmup requests...", self.args.warmup);
+
+        let semaphore = Arc::new(Semaphore::new(self.args.concurrency));
+
+        for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {
+            let permit = semaphore.clone().acquire_owned().await?;
+            let client = self.client.clone();
+            let request = req.clone();
+            let base_url = self.args.url.clone();
+
+            tokio::spawn(async move {
+                let url = format!("{}{}", base_url, request.path);
+                let _ = client.request(request.method, url).send().await;
+                drop(permit);
+            });
+
+            if (i + 1) % 100 == 0 {
+                println!("  Warmup {}/{}", i + 1, self.args.warmup);
+            }
+        }
+
+        Ok(())
+    }


⚠️ Potential issue | 🟠 Major

Warmup tasks are fire-and-forget — benchmark may start before warmup completes.

run_warmup spawns tasks but never joins them. After the for loop exits, the function returns Ok(()) while warmup HTTP requests may still be in-flight. This means the benchmark phase can start immediately, with warmup traffic still hitting the server — skewing latency measurements.

You can drain the semaphore to ensure all permits are returned (i.e., all tasks completed):

Proposed fix: wait for all tasks to finish

pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> { println!("Running {} warmup requests...", self.args.warmup); let semaphore = Arc::new(Semaphore::new(self.args.concurrency)); + let mut handles = Vec::new(); for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() { let permit = semaphore.clone().acquire_owned().await?; let client = self.client.clone(); let request = req.clone(); let base_url = self.args.url.clone(); - tokio::spawn(async move { + handles.push(tokio::spawn(async move { let url = format!("{}{}", base_url, request.path); let _ = client.request(request.method, url).send().await; drop(permit); - }); + })); if (i + 1) % 100 == 0 { println!(" Warmup {}/{}", i + 1, self.args.warmup); } } + for handle in handles { + let _ = handle.await; + } + Ok(()) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {

println!("Running {} warmup requests...", self.args.warmup);

let semaphore = Arc::new(Semaphore::new(self.args.concurrency));

for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {

let permit = semaphore.clone().acquire_owned().await?;

let client = self.client.clone();

let request = req.clone();

let base_url = self.args.url.clone();

tokio::spawn(async move {

let url = format!("{}{}", base_url, request.path);

let _ = client.request(request.method, url).send().await;

drop(permit);

});

if (i + 1) % 100 == 0 {

println!(" Warmup {}/{}", i + 1, self.args.warmup);

}

}

Ok(())

}

pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {

println!("Running {} warmup requests...", self.args.warmup);

let semaphore = Arc::new(Semaphore::new(self.args.concurrency));

let mut handles = Vec::new();

for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {

let permit = semaphore.clone().acquire_owned().await?;

let client = self.client.clone();

let request = req.clone();

let base_url = self.args.url.clone();

handles.push(tokio::spawn(async move {

let url = format!("{}{}", base_url, request.path);

let _ = client.request(request.method, url).send().await;

drop(permit);

}));

if (i + 1) % 100 == 0 {

println!(" Warmup {}/{}", i + 1, self.args.warmup);

}

}

for handle in handles {

let _ = handle.await;

}

Ok(())

}

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench/runner.rs` around lines 45 - 68, run_warmup currently fire-and-forgets spawned tasks so the function returns before warmup requests complete; fix it by waiting for all tasks to finish — after spawning the tasks in run_warmup (where you create semaphore, call acquire_owned, spawn with tokio::spawn and drop(permit)), drain the semaphore or otherwise await completion before returning (e.g., acquire all permits from the same Arc<Semaphore> or collect and await JoinHandles) so that every permit is returned (ensuring all spawned requests have finished) prior to returning Ok(()).

coderabbitai · 2026-02-09T23:54:06Z

xtask/src/minibf_bench/sampler.rs

+async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {
+    let url = format!("{}/blocks/{}", base_url, height);
+    let resp: serde_json::Value = client.get(&url).send().await?.json().await?;
+
+    let block_hash = resp["hash"].as_str().unwrap_or_default().to_string();
+    let block_number = resp["height"].as_u64().unwrap_or(height as u64);
+    let slot = resp["slot"].as_u64().unwrap_or_default();
+    let epoch = resp["epoch"].as_u64().unwrap_or_default();
+
+    // Get transactions for this block
+    let txs_url = format!("{}/blocks/{}/txs?count=100", base_url, block_hash);
+    let txs_resp: serde_json::Value = client.get(&txs_url).send().await?.json().await?;
+
+    let txs = match txs_resp.as_array() {
+        Some(arr) => arr
+            .iter()
+            .filter_map(|v| v["tx_hash"].as_str().or_else(|| v.as_str()).map(String::from))
+            .take(20)
+            .collect(),
+        None => Vec::new(),
+    };


⚠️ Potential issue | 🟠 Major

No HTTP status validation — silent data corruption on API errors.

fetch_sample doesn't check the response status before parsing JSON. If the block height doesn't exist (404) or the server errors (500), the response may not be valid JSON or will have an unexpected shape, causing block_hash to silently default to "". The subsequent request to /blocks//txs?count=100 would then also fail silently, producing an empty ChainSample.

Consider using resp.error_for_status()?.json() or checking .status() explicitly so sampling failures surface immediately.

Proposed fix

async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> { let url = format!("{}/blocks/{}", base_url, height); - let resp: serde_json::Value = client.get(&url).send().await?.json().await?; + let resp: serde_json::Value = client + .get(&url) + .send() + .await? + .error_for_status()? + .json() + .await?;

- let txs_resp: serde_json::Value = client.get(&txs_url).send().await?.json().await?; + let txs_resp: serde_json::Value = client + .get(&txs_url) + .send() + .await? + .error_for_status()? + .json() + .await?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {

let url = format!("{}/blocks/{}", base_url, height);

let resp: serde_json::Value = client.get(&url).send().await?.json().await?;

let block_hash = resp["hash"].as_str().unwrap_or_default().to_string();

let block_number = resp["height"].as_u64().unwrap_or(height as u64);

let slot = resp["slot"].as_u64().unwrap_or_default();

let epoch = resp["epoch"].as_u64().unwrap_or_default();

// Get transactions for this block

let txs_url = format!("{}/blocks/{}/txs?count=100", base_url, block_hash);

let txs_resp: serde_json::Value = client.get(&txs_url).send().await?.json().await?;

let txs = match txs_resp.as_array() {

Some(arr) => arr

.iter()

.filter_map(|v| v["tx_hash"].as_str().or_else(|| v.as_str()).map(String::from))

.take(20)

.collect(),

None => Vec::new(),

};

async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {

let url = format!("{}/blocks/{}", base_url, height);

let resp: serde_json::Value = client

.get(&url)

.send()

.await?

.error_for_status()?

.json()

.await?;

let block_hash = resp["hash"].as_str().unwrap_or_default().to_string();

let block_number = resp["height"].as_u64().unwrap_or(height as u64);

let slot = resp["slot"].as_u64().unwrap_or_default();

let epoch = resp["epoch"].as_u64().unwrap_or_default();

// Get transactions for this block

let txs_url = format!("{}/blocks/{}/txs?count=100", base_url, block_hash);

let txs_resp: serde_json::Value = client

.get(&txs_url)

.send()

.await?

.error_for_status()?

.json()

.await?;

let txs = match txs_resp.as_array() {

Some(arr) => arr

.iter()

.filter_map(|v| v["tx_hash"].as_str().or_else(|| v.as_str()).map(String::from))

.take(20)

.collect(),

None => Vec::new(),

};

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench/sampler.rs` around lines 39 - 59, fetch_sample currently parses JSON without validating HTTP status which lets 4xx/5xx responses silently produce default values; update fetch_sample to call error_for_status() (or check resp.status().is_success()) on both the initial block request and the txs request before calling .json(), so any non-success HTTP response returns an Err and surfaces sampling failures immediately (apply to the client.get(...).send().await? flow that binds resp and txs_resp, and propagate or map the error instead of letting block_hash/txs default).

coderabbitai · 2026-02-09T23:54:06Z

xtask/src/minibf_bench/stats.rs

+        let mut histogram = Histogram::<u64>::new_with_bounds(1, 60_000_000, 3)?;
+        let mut success_count = 0u64;
+        let mut failure_count = 0u64;
+
+        for result in &results {
+            histogram.record(result.latency_micros)?;


⚠️ Potential issue | 🟡 Minor

histogram.record() will error on a 0-microsecond latency.

The histogram's lower bound is 1, so record(0) returns Err(RecordError) which propagates via ? and aborts the entire stats calculation. While rare, a sub-microsecond response (e.g., connection reuse hitting cache) could produce 0 from as_micros() as u64.

Proposed fix: clamp to lower bound

for result in &results { - histogram.record(result.latency_micros)?; + histogram.record(result.latency_micros.max(1))?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let mut histogram = Histogram::<u64>::new_with_bounds(1, 60_000_000, 3)?;

let mut success_count = 0u64;

let mut failure_count = 0u64;

for result in &results {

histogram.record(result.latency_micros)?;

let mut histogram = Histogram::<u64>::new_with_bounds(1, 60_000_000, 3)?;

let mut success_count = 0u64;

let mut failure_count = 0u64;

for result in &results {

histogram.record(result.latency_micros.max(1))?;

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench/stats.rs` around lines 48 - 53, histogram.record() can return Err when given 0 because Histogram::new_with_bounds(1, 60_000_000, 3) sets a lower bound of 1; when iterating results and calling histogram.record(result.latency_micros)? you should clamp or map zero latencies to the histogram's minimum (e.g., let v = result.latency_micros.max(1); histogram.record(v)?), or skip recording zeros explicitly, so update the loop that references result.latency_micros and the histogram.record call to ensure no 0 is passed.

coderabbitai · 2026-02-09T23:54:06Z

xtask/src/minibf_bench/vectors.rs

+    let vectors_path = PathBuf::from(VECTORS_DIR).join(format!("{}.json", network.as_str()));
+
+    // Check if we can use cached vectors
+    if !force_refresh && vectors_path.exists() {
+        let metadata = fs::metadata(&vectors_path)?;
+        let modified = metadata.modified()?;
+        let age = SystemTime::now().duration_since(modified)?;
+
+        if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {
+            let content = fs::read_to_string(&vectors_path)?;
+            return Ok(serde_json::from_str(&content)?);
+        }
+    }


⚠️ Potential issue | 🟡 Minor

duration_since will error on clock skew, causing load_vectors to fail instead of regenerating.

On line 205, if the file's modification time is somehow in the future (clock adjustment, NFS, etc.), SystemTime::now().duration_since(modified) returns Err. The ? propagates this as an error, which aborts the entire function rather than falling through to regenerate vectors.

Consider treating a future mtime as "cache expired":

🛡️ Suggested defensive handling

- let age = SystemTime::now().duration_since(modified)?; - - if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) { + let age = SystemTime::now() + .duration_since(modified) + .unwrap_or(Duration::from_secs(u64::MAX)); + + if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let vectors_path = PathBuf::from(VECTORS_DIR).join(format!("{}.json", network.as_str()));

// Check if we can use cached vectors

if !force_refresh && vectors_path.exists() {

let metadata = fs::metadata(&vectors_path)?;

let modified = metadata.modified()?;

let age = SystemTime::now().duration_since(modified)?;

if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {

let content = fs::read_to_string(&vectors_path)?;

return Ok(serde_json::from_str(&content)?);

}

}

let vectors_path = PathBuf::from(VECTORS_DIR).join(format!("{}.json", network.as_str()));

// Check if we can use cached vectors

if !force_refresh && vectors_path.exists() {

let metadata = fs::metadata(&vectors_path)?;

let modified = metadata.modified()?;

let age = SystemTime::now()

.duration_since(modified)

.unwrap_or(Duration::from_secs(u64::MAX));

if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {

let content = fs::read_to_string(&vectors_path)?;

return Ok(serde_json::from_str(&content)?);

}

}

🤖 Prompt for AI Agents

In `@xtask/src/minibf_bench/vectors.rs` around lines 199 - 211, The cache-check currently calls SystemTime::now().duration_since(modified)? which will return Err for future mtimes and abort load_vectors; change the logic around vectors_path so the duration_since result is handled non-panically (e.g., match or if let Ok(age) = SystemTime::now().duration_since(modified) { if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) { read and return cached JSON } } ), treating Err (future modification time) as cache-expired and allowing the function to continue to regenerate vectors; update the code that references vectors_path, modified, and CACHE_TTL_HOURS accordingly.

test(minibf): introduce endpoint benchmark via xtask

810e759

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

	use crate::config::{load_xtask_config, Network};
	use crate::config::load_xtask_config;

		let start_time = Instant::now();
		let repo_root = std::env::current_dir().context("detecting repo root")?;

Conversation

scarmuega commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

scarmuega commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading