Skip to content

test(minibf): introduce endpoint benchmark via xtask#889

Open
scarmuega wants to merge 1 commit intomainfrom
dolos-minibf-bench
Open

test(minibf): introduce endpoint benchmark via xtask#889
scarmuega wants to merge 1 commit intomainfrom
dolos-minibf-bench

Conversation

@scarmuega
Copy link
Member

@scarmuega scarmuega commented Feb 9, 2026

Summary by CodeRabbit

  • New Features

    • Added end-to-end benchmarking tool for minibf with support for concurrent request execution, latency analysis, pagination testing, and detailed report generation (JSON/Table formats).
    • Added preprod benchmark dataset for testing scenarios.
  • Chores

    • Added cargo alias for streamlined command execution.
    • Internal code formatting refinements.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

This PR introduces minibf_bench, a comprehensive HTTP benchmarking tool integrated into the xtask CLI. The tool loads test vectors from DBSync, generates HTTP requests against minibf endpoints, executes concurrent benchmarks with warmup phases, collects latency statistics, and generates detailed JSON reports with pagination analysis.

Changes

Cohort / File(s) Summary
Cargo Configuration
.cargo/config.toml, xtask/Cargo.toml
Added xtask cargo alias and new dependencies (hdrhistogram, chrono, reqwest, tokio) for benchmarking infrastructure.
Test Vector Data
src/minibf_bench/vectors/preprod.json
Added preprod network benchmark dataset with address vectors, stake address vectors, and reference block metadata (429 lines).
Minibf Benchmark Framework
xtask/src/minibf_bench.rs, xtask/src/minibf_bench/args.rs, xtask/src/minibf_bench/runner.rs, xtask/src/minibf_bench/sampler.rs, xtask/src/minibf_bench/stats.rs, xtask/src/minibf_bench/vectors.rs, xtask/src/minibf_bench/report.rs
Comprehensive benchmarking suite: orchestration module with request generation, CLI argument parsing with Clap, HTTP request execution with async concurrency control, historical chain sampling via API, statistical analysis with HDR histograms, test vector management with DBSync integration and file caching, and JSON report generation with pagination analysis.
CLI Integration
xtask/src/main.rs
Added MinibfBench subcommand to Commands enum and wired execution path to minibf_bench module.
Ground Truth Module Formatting
xtask/src/ground_truth/delegation.rs, xtask/src/ground_truth/epochs.rs, xtask/src/ground_truth/eras.rs, xtask/src/ground_truth/query.rs, xtask/src/ground_truth/rewards.rs, xtask/src/ground_truth/stake.rs
Minor formatting refinements: line wrapping and macro invocation restructuring with no semantic changes.

Sequence Diagram

sequenceDiagram
    participant User
    participant xtask as xtask CLI
    participant VecMgr as Vector Manager
    participant DBSync
    participant Sampler
    participant ReqGen as Request Generator
    participant Runner as Benchmark Runner
    participant Endpoint as minibf Endpoints
    participant Stats as Stats Calculator
    participant Reporter as Report Generator

    User->>xtask: minibf-bench command
    xtask->>VecMgr: load_vectors()
    alt Cache hit & valid
        VecMgr-->>xtask: cached TestVectors
    else Cache miss or expired
        VecMgr->>DBSync: query addresses & stakes
        DBSync-->>VecMgr: vectors data
        VecMgr->>VecMgr: cache to preprod.json
        VecMgr-->>xtask: TestVectors
    end
    
    xtask->>Sampler: fetch historical samples
    Sampler->>Endpoint: GET /blocks/{height}
    Endpoint-->>Sampler: block data
    Sampler-->>xtask: HistoricalSamples
    
    xtask->>ReqGen: generate_requests(vectors, samples)
    ReqGen-->>xtask: Vec<TestRequest>
    
    xtask->>Runner: run_warmup(requests)
    Runner->>Endpoint: concurrent HTTP requests
    Endpoint-->>Runner: responses
    Runner-->>xtask: warmup complete
    
    xtask->>Runner: run_benchmark(requests)
    par Concurrent Execution
        Runner->>Endpoint: request 1
        Endpoint-->>Runner: response + latency
        Runner->>Endpoint: request 2
        Endpoint-->>Runner: response + latency
        Runner->>Endpoint: request N
        Endpoint-->>Runner: response + latency
    end
    Runner-->>xtask: Vec<RequestResult>, Histogram
    
    xtask->>Stats: calculate_endpoint_stats(results)
    Stats->>Stats: aggregate by endpoint, compute percentiles
    Stats-->>xtask: Vec<EndpointStats>
    
    xtask->>Reporter: generate_report(stats, vectors)
    Reporter->>Reporter: assemble metadata, summary, pagination analysis
    Reporter-->>xtask: BenchmarkReport
    
    xtask->>Reporter: write_report(output_path)
    Reporter->>Reporter: serialize to JSON
    Reporter-->>User: report written
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Benchmarks bloom where data flows so free,
With vectors cached and endpoints spree,
Histograms dance in concurrent delight,
Reports emerge from tests run through the night! 📊✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a comprehensive endpoint benchmarking tool for minibf via xtask, which is reflected across multiple new modules and configurations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dolos-minibf-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Fix all issues with AI agents
In `@src/minibf_bench/vectors/preprod.json`:
- Around line 409-428: The reference block hashes in the generated vectors JSON
contain PostgreSQL bytea hex escapes (e.g., "\\x..."); update the code that
populates these hashes—specifically the get_block_at_height function (or the
code path that writes reference_blocks into the vectors JSON/sampler output)—to
normalize hashes by stripping the "\\x" prefix (e.g., call .strip_prefix("\\x")
or equivalent) before storing, or omit the hash field entirely if unused; ensure
the change is applied where reference_blocks are serialized so generated
preprod.json no longer contains Postgres-specific artifacts.

In `@xtask/src/minibf_bench.rs`:
- Around line 26-27: The benchmark start time is being captured too early
(start_time = Instant::now()) so duration includes config loading, vector
generation and sampling; move the Instant::now() call so timing begins
immediately before the actual benchmark run (e.g., just before calling
run_warmup or immediately after warmup finishes depending on whether warmup
should be excluded) and ensure the elapsed/duration calculation uses that moved
start_time variable; update any references to start_time and the duration
computation to use the new placement so reported RPS reflects only the measured
benchmark phase.
- Line 20: The import list includes an unused symbol "Network" causing a clippy
warning; remove "Network" from the use statement so it only imports
load_xtask_config (i.e., change the use of crate::config::{load_xtask_config,
Network} to only import load_xtask_config) and re-run cargo clippy --workspace
--all-targets --all-features to confirm the warning is resolved.
- Around line 47-52: The sampling client created for
HistoricalSamples::from_vectors currently uses reqwest::Client::new() without a
timeout and can hang; change the client construction in the rt.block_on block to
build a reqwest::Client with a request timeout (e.g., 30s) or reuse the same
configuration as BenchmarkRunner (use
reqwest::Client::builder().timeout(...).build()). Pass that timed client to
sampler::HistoricalSamples::from_vectors so sampling will error fast instead of
blocking indefinitely.

In `@xtask/src/minibf_bench/args.rs`:
- Around line 46-48: The CLI arg page_sizes is declared in args.rs but never
used by generate_requests in minibf_bench.rs which still hardcodes counts;
either remove page_sizes or wire it into request generation: change the args
struct to use pub page_sizes: Vec<usize> with #[arg(long, value_delimiter =
',')] (or parse the existing String into usize vec), then update
generate_requests (and any call sites) to iterate over args.page_sizes instead
of hardcoded 20/100 so the generated query strings use the configured page
sizes; ensure parsing errors are handled or validated before request creation.

In `@xtask/src/minibf_bench/report.rs`:
- Line 84: The computed rps = total_requests as f64 / duration.as_secs_f64() can
produce Infinity/NaN when duration is zero, causing serde_json to fail; update
the logic around rps (the variable computed in the report generation and
consumed by write_report) to guard against zero or non-finite results by
checking duration.as_secs_f64() and/or using f64::is_finite on the computed rps
and replacing any non-finite value with 0.0 (or another defined sentinel) before
serializing; ensure the change is applied where rps is computed so write_report
only ever sees finite floats.
- Around line 87-102: The code incorrectly averages per-endpoint percentiles
(endpoint_stats -> latency_p50_micros / p95 / p99) which is statistically
invalid; fix by computing true overall percentiles either by (preferred)
collecting all raw latency samples into one vector and computing percentiles
from that, or (if raw samples/histograms are unavailable) merging per-endpoint
histograms, or at minimum computing a request-count-weighted average using each
endpoint's request count (e.g., use endpoint_stats[*].request_count as weights)
when computing overall_p50/overall_p95/overall_p99; update the computation that
assigns overall_p50/overall_p95/overall_p99 to use one of these approaches and
reference endpoint_stats and the per-endpoint fields (latency_p50_micros,
latency_p95_micros, latency_p99_micros, and the endpoint request count or
histogram field) when making the change.

In `@xtask/src/minibf_bench/runner.rs`:
- Line 9: Remove the unused import `JoinHandle` from the `use
tokio::task::JoinHandle;` line in runner.rs to satisfy clippy; locate the `use`
statement that references `JoinHandle` (symbol: JoinHandle) and delete it (or
remove `JoinHandle` from the import list) so the file no longer imports an
unused symbol.
- Around line 45-68: run_warmup currently fire-and-forgets spawned tasks so the
function returns before warmup requests complete; fix it by waiting for all
tasks to finish — after spawning the tasks in run_warmup (where you create
semaphore, call acquire_owned, spawn with tokio::spawn and drop(permit)), drain
the semaphore or otherwise await completion before returning (e.g., acquire all
permits from the same Arc<Semaphore> or collect and await JoinHandles) so that
every permit is returned (ensuring all spawned requests have finished) prior to
returning Ok(()).

In `@xtask/src/minibf_bench/sampler.rs`:
- Around line 39-59: fetch_sample currently parses JSON without validating HTTP
status which lets 4xx/5xx responses silently produce default values; update
fetch_sample to call error_for_status() (or check resp.status().is_success()) on
both the initial block request and the txs request before calling .json(), so
any non-success HTTP response returns an Err and surfaces sampling failures
immediately (apply to the client.get(...).send().await? flow that binds resp and
txs_resp, and propagate or map the error instead of letting block_hash/txs
default).

In `@xtask/src/minibf_bench/stats.rs`:
- Around line 48-53: histogram.record() can return Err when given 0 because
Histogram::new_with_bounds(1, 60_000_000, 3) sets a lower bound of 1; when
iterating results and calling histogram.record(result.latency_micros)? you
should clamp or map zero latencies to the histogram's minimum (e.g., let v =
result.latency_micros.max(1); histogram.record(v)?), or skip recording zeros
explicitly, so update the loop that references result.latency_micros and the
histogram.record call to ensure no 0 is passed.

In `@xtask/src/minibf_bench/vectors.rs`:
- Around line 199-211: The cache-check currently calls
SystemTime::now().duration_since(modified)? which will return Err for future
mtimes and abort load_vectors; change the logic around vectors_path so the
duration_since result is handled non-panically (e.g., match or if let Ok(age) =
SystemTime::now().duration_since(modified) { if age <
Duration::from_secs(CACHE_TTL_HOURS * 3600) { read and return cached JSON } } ),
treating Err (future modification time) as cache-expired and allowing the
function to continue to regenerate vectors; update the code that references
vectors_path, modified, and CACHE_TTL_HOURS accordingly.
🧹 Nitpick comments (5)
xtask/src/minibf_bench/vectors.rs (4)

82-120: Potentially expensive query on tx_out — consider whether an index exists on tx_out.address.

The LIKE 'addr1%' / LIKE 'addr_test1%' pattern with a leading prefix is index-friendly, so it should be fine if a B-tree index exists on tx_out.address. However, tx_out is one of the largest tables in DBSync, and joining with tx plus the GROUP BY can be very slow on a production-sized database. Since this is for benchmarking/xtask and cached with a 24h TTL, the impact is limited.

Also, the address_type categorization (lines 107-111) only distinguishes "shelley_payment_stake" vs "shelley_payment_only" — Byron addresses are excluded by the LIKE filter, which seems intentional.


147-170: Type inconsistency between tip_height: i32 and BlockRef.height: i64.

tip_height is retrieved as i32 (line 158), and the arithmetic on lines 161-163 is done in i32, but BlockRef.height is i64. This is fine for the as i64 cast in get_block_at_height (line 185), but the intermediate calculations use i32 arithmetic. While tip_height * 9 won't overflow for realistic Cardano block heights (~11M × 9 = ~99M, well within i32::MAX of ~2.1B), consider using i64 consistently throughout for clarity and future-proofing, since BlockRef already uses i64.

♻️ Optional: use i64 consistently
-        let tip_height: i32 = tip_row.get(0);
+        let tip_height: i64 = tip_row.get::<_, i32>(0) as i64;
 
         // Calculate positions for 10%, 50%, 90%
-        let early_height = tip_height / 10;
-        let mid_height = tip_height / 2;
-        let recent_height = (tip_height * 9) / 10;
+        let early_height = (tip_height / 10) as i32;
+        let mid_height = (tip_height / 2) as i32;
+        let recent_height = ((tip_height * 9) / 10) as i32;

228-234: unwrap() on vectors_path.parent() — prefer explicit error handling.

While parent() won't realistically return None here (the path always has a parent directory), using unwrap() is a clippy-unfriendly pattern for production code. As per coding guidelines, clippy warnings should be resolved.

♻️ Suggested fix
-    fs::create_dir_all(&vectors_path.parent().unwrap())?;
+    let parent = vectors_path
+        .parent()
+        .context("vectors path has no parent directory")?;
+    fs::create_dir_all(parent)?;

220-225: Rough URL parsing to extract dbsync_host — consider using the url crate.

The split-based approach works for typical PostgreSQL connection strings, but it can break for URLs with unusual formats (e.g., no @, query parameters with @, etc.). For a benchmarking tool this is acceptable, and the unwrap_or("unknown") fallback is reasonable. Just flagging for awareness.

xtask/src/minibf_bench/report.rs (1)

151-181: Pagination error attribution is overly broad.

Line 168 assumes any success_rate < 1.0 on a paginated endpoint means "scan_limit_exceeded". In practice, failures could be network timeouts, 5xx errors, or other transient issues. This could mislead CI consumers into thinking there's a scan-limit problem when there isn't one.

Consider either inspecting actual error messages/status codes from the stats, or using a more neutral label like "partial_failures".

Comment on lines +409 to +428
"reference_blocks": {
"early": {
"height": 440647,
"hash": "\\xdd811d450847ac086901e6736a85509c2026f2f66ae0c4335ff85c7f2df7d66e",
"epoch": 41,
"slot": 16281734
},
"mid": {
"height": 2203237,
"hash": "\\x2944dd290435227f96340addb4b616a75035a55fc8119e7752fe4c05642d7889",
"epoch": 140,
"slot": 59122449
},
"recent": {
"height": 3965826,
"hash": "\\xdfc06de3d85ad3c8c7f223b4304a67a50377d1de7eb29b136f3e128f07c89778",
"epoch": 243,
"slot": 103708196
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how reference_blocks hashes are used across the codebase
rg -n "reference_blocks" --type=rs -C3
echo "---"
# Check what VECTORS_DIR points to, to verify this file's location is correct
rg -n "VECTORS_DIR" --type=rs -C2

Repository: txpipe/dolos

Length of output: 130


🏁 Script executed:

#!/bin/bash
# Check how reference_blocks hashes are used across the codebase
rg -n "reference_blocks" --type=rust -C3
echo "---"
# Check what VECTORS_DIR points to, to verify this file's location is correct
rg -n "VECTORS_DIR" --type=rust -C2

Repository: txpipe/dolos

Length of output: 2690


🏁 Script executed:

#!/bin/bash
# Find the sampler module and check how reference_blocks is used
rg -n "from_vectors" --type=rust -A10 -B2
echo "---"
# Also look for ReferenceBlocks struct definition
rg -n "struct ReferenceBlocks" --type=rust -A10

Repository: txpipe/dolos

Length of output: 2573


🏁 Script executed:

#!/bin/bash
# Search for any other usage of the hash field from reference_blocks or BlockRef
rg -n "\.hash" xtask/src/minibf_bench/ --type=rust -B2 -A2

Repository: txpipe/dolos

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Look at the query_reference_blocks method to see how hashes are fetched from the database
rg -n "query_reference_blocks" --type=rust -A30

Repository: txpipe/dolos

Length of output: 4570


🏁 Script executed:

#!/bin/bash
# Get the rest of get_block_at_height method
rg -n "get_block_at_height" --type=rust -A15

Repository: txpipe/dolos

Length of output: 1589


🏁 Script executed:

#!/bin/bash
# Let's see what the actual hash looks like in the JSON file
cat -n src/minibf_bench/vectors/preprod.json | grep -A5 "reference_blocks"

Repository: txpipe/dolos

Length of output: 287


Remove or normalize block hashes before storing in vectors JSON — PostgreSQL artifacts shouldn't leak into test data.

The \\x prefix on reference block hashes (e.g., \\xdd811d...) is a PostgreSQL bytea hex escape that appears because get_block_at_height queries the database directly without stripping the format. While the sampler currently uses only block heights for API calls (not hashes), storing PostgreSQL-formatted data in JSON is a data quality issue. Consider calling .strip_prefix("\\x") when populating the hash field, or exclude hashes entirely if they're not needed.

🤖 Prompt for AI Agents
In `@src/minibf_bench/vectors/preprod.json` around lines 409 - 428, The reference
block hashes in the generated vectors JSON contain PostgreSQL bytea hex escapes
(e.g., "\\x..."); update the code that populates these hashes—specifically the
get_block_at_height function (or the code path that writes reference_blocks into
the vectors JSON/sampler output)—to normalize hashes by stripping the "\\x"
prefix (e.g., call .strip_prefix("\\x") or equivalent) before storing, or omit
the hash field entirely if unused; ensure the change is applied where
reference_blocks are serialized so generated preprod.json no longer contains
Postgres-specific artifacts.

use reqwest::Method;
use xshell::Shell;

use crate::config::{load_xtask_config, Network};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused import flagged by static analysis.

Network on line 20 is unused. Remove it to satisfy cargo clippy.

Fix
-use crate::config::{load_xtask_config, Network};
+use crate::config::load_xtask_config;

As per coding guidelines, **/*.rs: "Run cargo clippy --workspace --all-targets --all-features and resolve all clippy warnings before committing changes".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
use crate::config::{load_xtask_config, Network};
use crate::config::load_xtask_config;
🧰 Tools
🪛 GitHub Check: Check Build

[warning] 20-20:
unused import: Network

🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench.rs` at line 20, The import list includes an unused
symbol "Network" causing a clippy warning; remove "Network" from the use
statement so it only imports load_xtask_config (i.e., change the use of
crate::config::{load_xtask_config, Network} to only import load_xtask_config)
and re-run cargo clippy --workspace --all-targets --all-features to confirm the
warning is resolved.

Comment on lines +26 to +27
let start_time = Instant::now();
let repo_root = std::env::current_dir().context("detecting repo root")?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

duration includes vector loading, sampling, and request generation — inflating the report.

start_time is captured at line 26 before config loading, vector generation, and chain sampling. The elapsed time at line 79 therefore overstates the actual benchmark duration, deflating the reported RPS.

Consider capturing the start time just before run_warmup (or after warmup, depending on intent) for a more accurate benchmark duration.

Also applies to: 78-80

🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench.rs` around lines 26 - 27, The benchmark start time is
being captured too early (start_time = Instant::now()) so duration includes
config loading, vector generation and sampling; move the Instant::now() call so
timing begins immediately before the actual benchmark run (e.g., just before
calling run_warmup or immediately after warmup finishes depending on whether
warmup should be excluded) and ensure the elapsed/duration calculation uses that
moved start_time variable; update any references to start_time and the duration
computation to use the new placement so reported RPS reflects only the measured
benchmark phase.

Comment on lines +47 to +52
let rt = tokio::runtime::Runtime::new()?;
let samples = rt.block_on(async {
let client = reqwest::Client::new();
sampler::HistoricalSamples::from_vectors(&client, &args.url, &vectors.reference_blocks)
.await
})?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sampling client has no timeout — may hang indefinitely.

The reqwest::Client::new() used for sampling (line 49) has no timeout, unlike the BenchmarkRunner client which uses 30s. If the target server is unreachable, this will block forever.

Proposed fix
-        let client = reqwest::Client::new();
+        let client = reqwest::Client::builder()
+            .timeout(std::time::Duration::from_secs(30))
+            .build()?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let rt = tokio::runtime::Runtime::new()?;
let samples = rt.block_on(async {
let client = reqwest::Client::new();
sampler::HistoricalSamples::from_vectors(&client, &args.url, &vectors.reference_blocks)
.await
})?;
let rt = tokio::runtime::Runtime::new()?;
let samples = rt.block_on(async {
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.build()?;
sampler::HistoricalSamples::from_vectors(&client, &args.url, &vectors.reference_blocks)
.await
})?;
🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench.rs` around lines 47 - 52, The sampling client created
for HistoricalSamples::from_vectors currently uses reqwest::Client::new()
without a timeout and can hang; change the client construction in the
rt.block_on block to build a reqwest::Client with a request timeout (e.g., 30s)
or reuse the same configuration as BenchmarkRunner (use
reqwest::Client::builder().timeout(...).build()). Pass that timed client to
sampler::HistoricalSamples::from_vectors so sampling will error fast instead of
blocking indefinitely.

Comment on lines +46 to +48
/// Page sizes to test (comma-separated)
#[arg(long, default_value = "20,50,100")]
pub page_sizes: String,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

page_sizes is declared but never consumed in request generation.

generate_requests in minibf_bench.rs hardcodes count=20 and count=100 in query strings rather than reading args.page_sizes. Either wire page_sizes into the request generation or remove the argument to avoid misleading users.

Also, consider using Vec<usize> with #[arg(value_delimiter = ',')] instead of a raw String for type-safe parsing.

🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench/args.rs` around lines 46 - 48, The CLI arg page_sizes
is declared in args.rs but never used by generate_requests in minibf_bench.rs
which still hardcodes counts; either remove page_sizes or wire it into request
generation: change the args struct to use pub page_sizes: Vec<usize> with
#[arg(long, value_delimiter = ',')] (or parse the existing String into usize
vec), then update generate_requests (and any call sites) to iterate over
args.page_sizes instead of hardcoded 20/100 so the generated query strings use
the configured page sizes; ensure parsing errors are handled or validated before
request creation.

use std::sync::Arc;
use std::time::{Duration, Instant};
use tokio::sync::{mpsc, Semaphore};
use tokio::task::JoinHandle;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused import: JoinHandle.

Flagged by static analysis. Remove to satisfy clippy.

Fix
-use tokio::task::JoinHandle;

As per coding guidelines, **/*.rs: "Run cargo clippy --workspace --all-targets --all-features and resolve all clippy warnings before committing changes".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
use tokio::task::JoinHandle;
🧰 Tools
🪛 GitHub Check: Check Build

[warning] 9-9:
unused import: tokio::task::JoinHandle

🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench/runner.rs` at line 9, Remove the unused import
`JoinHandle` from the `use tokio::task::JoinHandle;` line in runner.rs to
satisfy clippy; locate the `use` statement that references `JoinHandle` (symbol:
JoinHandle) and delete it (or remove `JoinHandle` from the import list) so the
file no longer imports an unused symbol.

Comment on lines +45 to +68
pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {
println!("Running {} warmup requests...", self.args.warmup);

let semaphore = Arc::new(Semaphore::new(self.args.concurrency));

for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {
let permit = semaphore.clone().acquire_owned().await?;
let client = self.client.clone();
let request = req.clone();
let base_url = self.args.url.clone();

tokio::spawn(async move {
let url = format!("{}{}", base_url, request.path);
let _ = client.request(request.method, url).send().await;
drop(permit);
});

if (i + 1) % 100 == 0 {
println!(" Warmup {}/{}", i + 1, self.args.warmup);
}
}

Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Warmup tasks are fire-and-forget — benchmark may start before warmup completes.

run_warmup spawns tasks but never joins them. After the for loop exits, the function returns Ok(()) while warmup HTTP requests may still be in-flight. This means the benchmark phase can start immediately, with warmup traffic still hitting the server — skewing latency measurements.

You can drain the semaphore to ensure all permits are returned (i.e., all tasks completed):

Proposed fix: wait for all tasks to finish
     pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {
         println!("Running {} warmup requests...", self.args.warmup);

         let semaphore = Arc::new(Semaphore::new(self.args.concurrency));
+        let mut handles = Vec::new();

         for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {
             let permit = semaphore.clone().acquire_owned().await?;
             let client = self.client.clone();
             let request = req.clone();
             let base_url = self.args.url.clone();

-            tokio::spawn(async move {
+            handles.push(tokio::spawn(async move {
                 let url = format!("{}{}", base_url, request.path);
                 let _ = client.request(request.method, url).send().await;
                 drop(permit);
-            });
+            }));

             if (i + 1) % 100 == 0 {
                 println!("  Warmup {}/{}", i + 1, self.args.warmup);
             }
         }

+        for handle in handles {
+            let _ = handle.await;
+        }
+
         Ok(())
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {
println!("Running {} warmup requests...", self.args.warmup);
let semaphore = Arc::new(Semaphore::new(self.args.concurrency));
for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {
let permit = semaphore.clone().acquire_owned().await?;
let client = self.client.clone();
let request = req.clone();
let base_url = self.args.url.clone();
tokio::spawn(async move {
let url = format!("{}{}", base_url, request.path);
let _ = client.request(request.method, url).send().await;
drop(permit);
});
if (i + 1) % 100 == 0 {
println!(" Warmup {}/{}", i + 1, self.args.warmup);
}
}
Ok(())
}
pub async fn run_warmup(&self, requests: Vec<TestRequest>) -> Result<()> {
println!("Running {} warmup requests...", self.args.warmup);
let semaphore = Arc::new(Semaphore::new(self.args.concurrency));
let mut handles = Vec::new();
for (i, req) in requests.iter().cycle().take(self.args.warmup).enumerate() {
let permit = semaphore.clone().acquire_owned().await?;
let client = self.client.clone();
let request = req.clone();
let base_url = self.args.url.clone();
handles.push(tokio::spawn(async move {
let url = format!("{}{}", base_url, request.path);
let _ = client.request(request.method, url).send().await;
drop(permit);
}));
if (i + 1) % 100 == 0 {
println!(" Warmup {}/{}", i + 1, self.args.warmup);
}
}
for handle in handles {
let _ = handle.await;
}
Ok(())
}
🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench/runner.rs` around lines 45 - 68, run_warmup currently
fire-and-forgets spawned tasks so the function returns before warmup requests
complete; fix it by waiting for all tasks to finish — after spawning the tasks
in run_warmup (where you create semaphore, call acquire_owned, spawn with
tokio::spawn and drop(permit)), drain the semaphore or otherwise await
completion before returning (e.g., acquire all permits from the same
Arc<Semaphore> or collect and await JoinHandles) so that every permit is
returned (ensuring all spawned requests have finished) prior to returning
Ok(()).

Comment on lines +39 to +59
async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {
let url = format!("{}/blocks/{}", base_url, height);
let resp: serde_json::Value = client.get(&url).send().await?.json().await?;

let block_hash = resp["hash"].as_str().unwrap_or_default().to_string();
let block_number = resp["height"].as_u64().unwrap_or(height as u64);
let slot = resp["slot"].as_u64().unwrap_or_default();
let epoch = resp["epoch"].as_u64().unwrap_or_default();

// Get transactions for this block
let txs_url = format!("{}/blocks/{}/txs?count=100", base_url, block_hash);
let txs_resp: serde_json::Value = client.get(&txs_url).send().await?.json().await?;

let txs = match txs_resp.as_array() {
Some(arr) => arr
.iter()
.filter_map(|v| v["tx_hash"].as_str().or_else(|| v.as_str()).map(String::from))
.take(20)
.collect(),
None => Vec::new(),
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

No HTTP status validation — silent data corruption on API errors.

fetch_sample doesn't check the response status before parsing JSON. If the block height doesn't exist (404) or the server errors (500), the response may not be valid JSON or will have an unexpected shape, causing block_hash to silently default to "". The subsequent request to /blocks//txs?count=100 would then also fail silently, producing an empty ChainSample.

Consider using resp.error_for_status()?.json() or checking .status() explicitly so sampling failures surface immediately.

Proposed fix
 async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {
     let url = format!("{}/blocks/{}", base_url, height);
-    let resp: serde_json::Value = client.get(&url).send().await?.json().await?;
+    let resp: serde_json::Value = client
+        .get(&url)
+        .send()
+        .await?
+        .error_for_status()?
+        .json()
+        .await?;
-    let txs_resp: serde_json::Value = client.get(&txs_url).send().await?.json().await?;
+    let txs_resp: serde_json::Value = client
+        .get(&txs_url)
+        .send()
+        .await?
+        .error_for_status()?
+        .json()
+        .await?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {
let url = format!("{}/blocks/{}", base_url, height);
let resp: serde_json::Value = client.get(&url).send().await?.json().await?;
let block_hash = resp["hash"].as_str().unwrap_or_default().to_string();
let block_number = resp["height"].as_u64().unwrap_or(height as u64);
let slot = resp["slot"].as_u64().unwrap_or_default();
let epoch = resp["epoch"].as_u64().unwrap_or_default();
// Get transactions for this block
let txs_url = format!("{}/blocks/{}/txs?count=100", base_url, block_hash);
let txs_resp: serde_json::Value = client.get(&txs_url).send().await?.json().await?;
let txs = match txs_resp.as_array() {
Some(arr) => arr
.iter()
.filter_map(|v| v["tx_hash"].as_str().or_else(|| v.as_str()).map(String::from))
.take(20)
.collect(),
None => Vec::new(),
};
async fn fetch_sample(client: &Client, base_url: &str, height: i64) -> Result<ChainSample> {
let url = format!("{}/blocks/{}", base_url, height);
let resp: serde_json::Value = client
.get(&url)
.send()
.await?
.error_for_status()?
.json()
.await?;
let block_hash = resp["hash"].as_str().unwrap_or_default().to_string();
let block_number = resp["height"].as_u64().unwrap_or(height as u64);
let slot = resp["slot"].as_u64().unwrap_or_default();
let epoch = resp["epoch"].as_u64().unwrap_or_default();
// Get transactions for this block
let txs_url = format!("{}/blocks/{}/txs?count=100", base_url, block_hash);
let txs_resp: serde_json::Value = client
.get(&txs_url)
.send()
.await?
.error_for_status()?
.json()
.await?;
let txs = match txs_resp.as_array() {
Some(arr) => arr
.iter()
.filter_map(|v| v["tx_hash"].as_str().or_else(|| v.as_str()).map(String::from))
.take(20)
.collect(),
None => Vec::new(),
};
🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench/sampler.rs` around lines 39 - 59, fetch_sample
currently parses JSON without validating HTTP status which lets 4xx/5xx
responses silently produce default values; update fetch_sample to call
error_for_status() (or check resp.status().is_success()) on both the initial
block request and the txs request before calling .json(), so any non-success
HTTP response returns an Err and surfaces sampling failures immediately (apply
to the client.get(...).send().await? flow that binds resp and txs_resp, and
propagate or map the error instead of letting block_hash/txs default).

Comment on lines +48 to +53
let mut histogram = Histogram::<u64>::new_with_bounds(1, 60_000_000, 3)?;
let mut success_count = 0u64;
let mut failure_count = 0u64;

for result in &results {
histogram.record(result.latency_micros)?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

histogram.record() will error on a 0-microsecond latency.

The histogram's lower bound is 1, so record(0) returns Err(RecordError) which propagates via ? and aborts the entire stats calculation. While rare, a sub-microsecond response (e.g., connection reuse hitting cache) could produce 0 from as_micros() as u64.

Proposed fix: clamp to lower bound
         for result in &results {
-            histogram.record(result.latency_micros)?;
+            histogram.record(result.latency_micros.max(1))?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let mut histogram = Histogram::<u64>::new_with_bounds(1, 60_000_000, 3)?;
let mut success_count = 0u64;
let mut failure_count = 0u64;
for result in &results {
histogram.record(result.latency_micros)?;
let mut histogram = Histogram::<u64>::new_with_bounds(1, 60_000_000, 3)?;
let mut success_count = 0u64;
let mut failure_count = 0u64;
for result in &results {
histogram.record(result.latency_micros.max(1))?;
🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench/stats.rs` around lines 48 - 53, histogram.record() can
return Err when given 0 because Histogram::new_with_bounds(1, 60_000_000, 3)
sets a lower bound of 1; when iterating results and calling
histogram.record(result.latency_micros)? you should clamp or map zero latencies
to the histogram's minimum (e.g., let v = result.latency_micros.max(1);
histogram.record(v)?), or skip recording zeros explicitly, so update the loop
that references result.latency_micros and the histogram.record call to ensure no
0 is passed.

Comment on lines +199 to +211
let vectors_path = PathBuf::from(VECTORS_DIR).join(format!("{}.json", network.as_str()));

// Check if we can use cached vectors
if !force_refresh && vectors_path.exists() {
let metadata = fs::metadata(&vectors_path)?;
let modified = metadata.modified()?;
let age = SystemTime::now().duration_since(modified)?;

if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {
let content = fs::read_to_string(&vectors_path)?;
return Ok(serde_json::from_str(&content)?);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

duration_since will error on clock skew, causing load_vectors to fail instead of regenerating.

On line 205, if the file's modification time is somehow in the future (clock adjustment, NFS, etc.), SystemTime::now().duration_since(modified) returns Err. The ? propagates this as an error, which aborts the entire function rather than falling through to regenerate vectors.

Consider treating a future mtime as "cache expired":

🛡️ Suggested defensive handling
-        let age = SystemTime::now().duration_since(modified)?;
-
-        if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {
+        let age = SystemTime::now()
+            .duration_since(modified)
+            .unwrap_or(Duration::from_secs(u64::MAX));
+
+        if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let vectors_path = PathBuf::from(VECTORS_DIR).join(format!("{}.json", network.as_str()));
// Check if we can use cached vectors
if !force_refresh && vectors_path.exists() {
let metadata = fs::metadata(&vectors_path)?;
let modified = metadata.modified()?;
let age = SystemTime::now().duration_since(modified)?;
if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {
let content = fs::read_to_string(&vectors_path)?;
return Ok(serde_json::from_str(&content)?);
}
}
let vectors_path = PathBuf::from(VECTORS_DIR).join(format!("{}.json", network.as_str()));
// Check if we can use cached vectors
if !force_refresh && vectors_path.exists() {
let metadata = fs::metadata(&vectors_path)?;
let modified = metadata.modified()?;
let age = SystemTime::now()
.duration_since(modified)
.unwrap_or(Duration::from_secs(u64::MAX));
if age < Duration::from_secs(CACHE_TTL_HOURS * 3600) {
let content = fs::read_to_string(&vectors_path)?;
return Ok(serde_json::from_str(&content)?);
}
}
🤖 Prompt for AI Agents
In `@xtask/src/minibf_bench/vectors.rs` around lines 199 - 211, The cache-check
currently calls SystemTime::now().duration_since(modified)? which will return
Err for future mtimes and abort load_vectors; change the logic around
vectors_path so the duration_since result is handled non-panically (e.g., match
or if let Ok(age) = SystemTime::now().duration_since(modified) { if age <
Duration::from_secs(CACHE_TTL_HOURS * 3600) { read and return cached JSON } } ),
treating Err (future modification time) as cache-expired and allowing the
function to continue to regenerate vectors; update the code that references
vectors_path, modified, and CACHE_TTL_HOURS accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant