diff --git a/BENCH.md b/BENCH.md new file mode 100644 index 0000000..bc88217 --- /dev/null +++ b/BENCH.md @@ -0,0 +1,209 @@ +# Tail-latency benchmarks + +This document covers the **HDR-histogram** bench suite added in 0.7.0 +under `benches/order_book/*_hdr.rs`. The default Criterion benches in +the same directory remain — they publish HTML reports to +`target/criterion/` and report a mean-centric statistical comparison +that Criterion does well. The HDR benches are the source of truth for +the **tail** numbers (`p50` / `p99` / `p99.9` / `p99.99`) that tier-one +electronic exchanges quote in SLOs. + +## How to run + +```bash +make bench-hdr # all six scenarios +cargo bench --bench mixed_70_20_10_hdr # single scenario +``` + +Each bench writes its raw HDR histogram to +`target/bench-hdr/.hgrm` (V2 format) for downstream HDR +plotters; the directory lives under `target/` and is gitignored. + +## Methodology + +- **Histogram resolution.** `Histogram::` sized for `1 ns` to `1 s` + with three significant figures. Three sig-figs is enough to + distinguish `p99 ≠ p99.9` an order of magnitude apart while staying + memory-cheap (~80 KB per histogram). +- **Sample collection.** Each measured operation is wrapped in a closure + passed to `record(...)`, which times the closure with + `std::time::Instant::now()` (one call before, one after) and writes + the elapsed-nanosecond value into the histogram. The closure result + is consumed via `std::hint::black_box` to prevent dead-code + elimination. +- **Warmup.** Long-running scenarios (`add_only`, `mixed_70_20_10`) + discard 200 000 ops before the measurement window starts. + Pre-loading scenarios (`cancel_only`, `aggressive_walk`, + `mass_cancel_burst`) seed the book in a non-measured loop instead. +- **Workload determinism.** All scenarios drive a self-contained + xorshift PRNG seeded with `0xA5A5_A5A5_A5A5_A5A5`. Reproducing a run + with the same code produces the same op stream, modulo concurrent + scheduling jitter on the host. +- **Coordinated omission.** The bench loop is **closed-loop**: the + driver waits for each engine call to return before issuing the next. + Closed-loop measurements **systematically under-report** tail + latencies that a real load generator would observe under saturation, + because queueing delays that would build up under a fixed arrival + rate never materialize. **The numbers below are pure service time — + use them as a regression signal and a lower bound on the production + tail, not as a production SLO.** Open-loop measurement (record + `now - scheduled_arrival`, not `now - call_start`) is the right + follow-up; tracked but not in the initial drop. +- **CPU pinning.** Optional. On Linux, `taskset -c cargo bench + --bench mixed_70_20_10_hdr` reduces variance from cross-core + scheduling. On macOS the benches were run without pinning — see the + run conditions block below. + +## Run conditions for the numbers below + +| Item | Value | +|---|---| +| Host | Apple M4 Max, macOS 25.4 (Darwin 25.4.0, `arm64`) | +| Pinning | None | +| Toolchain | `rustc 1.95.0` (stable) | +| Profile | `--release` (Cargo `bench` profile = `release` clone) | +| `RUSTFLAGS` | unset | +| Allocator | system allocator | +| Date | 2026-04-25 | +| Crate version | `0.7.0-unreleased` (commit on `issue-56-hdr-bench`) | + +## Headline numbers + +All values in nanoseconds. **Closed-loop service time** — see +"Coordinated omission" above. + +### `add_only` — pure passive limit submission, no crossings + +200 000 warmup + 1 000 000 measured. + +| Quantile | Latency (ns) | +|---|---| +| p50 | 791 | +| p99 | 78 847 | +| p99.9 | 146 303 | +| p99.99 | 401 663 | +| max | 528 895 | + +**Where the tail comes from.** The book grows monotonically across the +measurement window, so each insert must walk the `SkipMap` to the +right level. The dominant contributor at p99.99 is allocator jitter +when `Arc` allocations churn under the system allocator; +secondary is L2 cache misses on the price-side `SkipMap` when the +working set outgrows L1. + +### `cancel_only` — pre-loaded book, sequential cancels + +1 000 000 pre-loaded resting orders, all cancelled in order. + +| Quantile | Latency (ns) | +|---|---| +| p50 | 42 | +| p99 | 25 167 | +| p99.9 | 34 047 | +| p99.99 | 172 031 | +| max | 1 271 807 | + +**Where the tail comes from.** `DashMap::remove` on the order index is +a shard-local lock acquisition; the median is dominated by that +single-cycle CAS path. The very long p99.99 / max tails reflect +shard-contention windows when multiple removals land on the same +shard back to back, plus rare allocator returns of large +`PriceLevel` linked-list nodes. + +### `aggressive_walk` — taker market orders sweep multi-level book + +50 levels × 100 resting orders pre-loaded, then 100 000 aggressive +buys with qty `5..=20`. + +| Quantile | Latency (ns) | +|---|---| +| p50 | 41 | +| p99 | 7 083 | +| p99.9 | 16 959 | +| p99.99 | 33 823 | +| max | 203 263 | + +**Where the tail comes from.** The fill loop iterates per-order at +each level until the requested quantity is consumed. Median is fast +because most sweeps fill within a single level. Tail is driven by +sweeps that span multiple levels and drop several `Arc`s +at once. + +### `mixed_70_20_10` — 70 % submit, 20 % cancel, 10 % aggressive + +200 000 warmup + 1 000 000 measured. The "realistic" headline number. + +| Quantile | Latency (ns) | +|---|---| +| p50 | 667 | +| p99 | 39 487 | +| p99.9 | 71 999 | +| p99.99 | 298 239 | +| max | 644 607 | + +**Where the tail comes from.** Mix of all three previous tails. The +median tracks `add_only` (because submits are 70 % of the workload). +The p99.99 comes from rare aggressive sweeps that interact with +allocator returns released by recent cancels. + +### `thin_book_sweep` — book near-empty, IOC probing + +Refills 3 resting asks every 5 ops; 200 000 IOC buy probes with qty +`1..=20`. + +| Quantile | Latency (ns) | +|---|---| +| p50 | 42 | +| p99 | 5 711 | +| p99.9 | 15 127 | +| p99.99 | 50 431 | +| max | 418 303 | + +**Where the tail comes from.** Most probes either fully fill the +small resting depth or partial-fill and short-circuit. The p99 is +shaped by the partial-fill-then-cancel-remainder bookkeeping; the max +is allocator jitter when the book transitions empty → non-empty. + +### `mass_cancel_burst` — dense book, then `cancel_all_orders` + +10 000 orders pre-loaded × 500 bursts. Each measured sample is +**one full burst**, not one cancel — useful as an operator-side +wall-clock guard rather than a per-op tail. + +| Quantile | Latency (ns) | +|---|---| +| p50 | 25 711 | +| p99 | 48 447 | +| p99.9 | 312 575 | +| p99.99 | 312 575 | +| max | 312 575 | + +**Where the tail comes from.** Burst latency scales linearly with the +book depth; on a tight host the median is ~26 µs to drain 10 000 +orders, ~0.5 ns per order amortised. The p99.9 / p99.99 / max all +collapse to the same value because only 500 samples were taken — the +single worst-case observation dominates. + +## Limitations + +- **macOS, no pinning.** The host above is a workstation, not a + performance-tuned bench rig. Tail numbers will be tighter on a + Linux host with `isolcpus=` + `nohz_full=` + a pinned thread, with + the system allocator swapped for `jemalloc` or `mimalloc`. +- **Closed-loop only.** As called out under Methodology — these + numbers are pure service time, not load-induced tail. Open-loop + measurement is the next iteration of this suite. +- **Single-threaded driver.** The benches issue one op at a time. A + multi-writer driver would surface `DashMap` shard contention more + visibly; deferred to a follow-up. + +## Reproducing + +```bash +git checkout issue-56-hdr-bench # or main once merged +make bench-hdr +cat target/bench-hdr/*.hgrm # raw histograms +``` + +`hgrm` files are V2 format — readable by `HdrHistogram` plot tooling +or convertible via `hdrhistogram`'s `Reader`. diff --git a/CHANGELOG.md b/CHANGELOG.md index 7cc49cd..0313335 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 > below group changes by feature; everything ships in the same > 0.7.0 publish. +### Added — HDR-histogram tail-latency bench suite (#56) + +- **Six new bench binaries** under `benches/order_book/*_hdr.rs` that + record per-sample latency into an `hdrhistogram::Histogram` and + emit `p50` / `p99` / `p99.9` / `p99.99` + min / max + sample count + to stdout. Scenarios: `add_only`, `cancel_only`, + `aggressive_walk`, `mixed_70_20_10`, `thin_book_sweep`, + `mass_cancel_burst`. Each is a `harness = false` binary that + coexists with the existing Criterion benches. +- **Shared helpers** in `benches/order_book/hdr_common.rs` + (`new_histogram`, `record`, `report`, `persist`) and a + self-contained xorshift PRNG so the bench tree pulls no extra + runtime dependency beyond `hdrhistogram`. +- **`hdrhistogram` ^7** as a dev-dependency. +- **`make bench-hdr`** target — runs all six scenarios in series. +- **`BENCH.md`** at repo root with methodology (warmup, closed-loop + vs open-loop disclosure), reproducibility steps, run conditions + block, and an honest table of the headline numbers from a single + M4 Max run plus a one-paragraph "where the tail comes from" + paragraph per scenario. Format-version stays at `2`. +- Raw histograms persist to `target/bench-hdr/.hgrm` (V2 + HDR format, gitignored under `target/`). + +### Notes — HDR bench + +- **Closed-loop service time only.** The driver waits for each call + before issuing the next — tail latencies under saturation will be + worse than what these numbers report. Used as a regression signal + and a lower-bound on production tail, not as a published SLO. + Open-loop measurement is a follow-up. +- The Criterion benches under `benches/order_book/` (`add_orders.rs`, + `match_orders.rs`, etc.) are unchanged. + ### Added — closed `RejectReason` enum (#55) - **New `RejectReason`** closed `#[non_exhaustive] #[repr(u16)]` enum diff --git a/Cargo.toml b/Cargo.toml index 88af40c..0ad2112 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -61,12 +61,43 @@ criterion = { version = "0.8", features = ["html_reports"] } tokio = { version = "1.52", features = ["macros", "rt-multi-thread", "time"] } tempfile = "3" proptest = "1.7" +hdrhistogram = "^7" [[bench]] name = "benches" path = "benches/mod.rs" harness = false +[[bench]] +name = "add_only_hdr" +path = "benches/order_book/add_only_hdr.rs" +harness = false + +[[bench]] +name = "cancel_only_hdr" +path = "benches/order_book/cancel_only_hdr.rs" +harness = false + +[[bench]] +name = "aggressive_walk_hdr" +path = "benches/order_book/aggressive_walk_hdr.rs" +harness = false + +[[bench]] +name = "mixed_70_20_10_hdr" +path = "benches/order_book/mixed_70_20_10_hdr.rs" +harness = false + +[[bench]] +name = "thin_book_sweep_hdr" +path = "benches/order_book/thin_book_sweep_hdr.rs" +harness = false + +[[bench]] +name = "mass_cancel_burst_hdr" +path = "benches/order_book/mass_cancel_burst_hdr.rs" +harness = false + [[test]] name = "tests" path = "tests/unit/mod.rs" diff --git a/Makefile b/Makefile index ae4f87d..203e4ac 100644 --- a/Makefile +++ b/Makefile @@ -167,6 +167,15 @@ bench-json: check-cargo-criterion bench-clean: rm -rf target/criterion +.PHONY: bench-hdr +bench-hdr: + cargo bench --bench add_only_hdr + cargo bench --bench cancel_only_hdr + cargo bench --bench aggressive_walk_hdr + cargo bench --bench mixed_70_20_10_hdr + cargo bench --bench thin_book_sweep_hdr + cargo bench --bench mass_cancel_burst_hdr + .PHONY: workflow-coverage workflow-coverage: diff --git a/README.md b/README.md index e93e860..00582b5 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,20 @@ This order book engine is built with the following design principles: ### What's New in Version 0.7.0 +#### v0.7.0 — HDR-histogram tail-latency bench suite + +- **Six new `*_hdr` bench binaries** under + `benches/order_book/`: `add_only`, `cancel_only`, + `aggressive_walk`, `mixed_70_20_10`, `thin_book_sweep`, + `mass_cancel_burst`. Each records per-sample nanosecond + latencies into an `hdrhistogram::Histogram` and emits + `p50` / `p99` / `p99.9` / `p99.99` + `min` / `max`. Coexists + with the existing Criterion benches. +- **`make bench-hdr`** convenience target. +- **Headline numbers + methodology** in `BENCH.md` at the repo + root, with a closed-loop disclosure block (the suite measures + service time, not load-induced tail). + #### v0.7.0 — Closed `RejectReason` enum - **New [`RejectReason`]** — closed diff --git a/benches/order_book/add_only_hdr.rs b/benches/order_book/add_only_hdr.rs new file mode 100644 index 0000000..bca79d3 --- /dev/null +++ b/benches/order_book/add_only_hdr.rs @@ -0,0 +1,33 @@ +// add_only_hdr — pure passive limit-order entry, no crossings. +// Measures `add_order` insert cost in isolation. + +#[path = "hdr_common.rs"] +mod common; + +use common::{Rng, new_histogram, persist, record, report, submit_gtc}; + +const SCENARIO: &str = "add_only"; +const WARMUP_OPS: u64 = 200_000; +const MEASURED_OPS: u64 = 1_000_000; +const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5; + +fn main() { + let book = common::fresh_book(); + let mut rng = Rng::new(SEED); + let mut hist = new_histogram(); + + // Warmup — discarded. + for i in 0..WARMUP_OPS { + submit_gtc(&book, &mut rng, i); + } + + // Measurement — id space picks up where warmup stopped to avoid + // collisions inside `order_locations`. + for i in 0..MEASURED_OPS { + let id = WARMUP_OPS + i; + record(&mut hist, || submit_gtc(&book, &mut rng, id)); + } + + report(SCENARIO, &hist); + persist(SCENARIO, &hist).expect("persist hgrm"); +} diff --git a/benches/order_book/aggressive_walk_hdr.rs b/benches/order_book/aggressive_walk_hdr.rs new file mode 100644 index 0000000..2e7c8cc --- /dev/null +++ b/benches/order_book/aggressive_walk_hdr.rs @@ -0,0 +1,54 @@ +// aggressive_walk_hdr — taker market orders sweep multi-level book. +// Measures the fill-loop tail under saturating liquidity. + +#[path = "hdr_common.rs"] +mod common; + +use common::{Rng, new_histogram, owner, persist, record, report}; +use pricelevel::{Id, Side, TimeInForce}; + +const SCENARIO: &str = "aggressive_walk"; +// Pre-load enough resting depth for every aggressive sweep to fill. +const RESTING_PER_LEVEL: u64 = 100; +const NUM_LEVELS: u64 = 50; +const MEASURED_OPS: u64 = 100_000; +const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5; + +fn main() { + let book = common::fresh_book(); + let mut rng = Rng::new(SEED); + let mut hist = new_histogram(); + let maker = owner(0xAA); + let taker = owner(0xBB); + + // Seed RESTING_PER_LEVEL asks at each of NUM_LEVELS prices. + let mut next_id = 1u64; + for level in 0..NUM_LEVELS { + let price = (100 + level) as u128; + for _ in 0..RESTING_PER_LEVEL { + let _ = book.add_limit_order_with_user( + Id::from_u64(next_id), + price, + rng.range(1, 10), + Side::Sell, + TimeInForce::Gtc, + maker, + None, + ); + next_id += 1; + } + } + + // Aggressive Buy sweeps. Each sweeps 5..=20 lots — usually clears + // a few orders within the same price level. + for i in 0..MEASURED_OPS { + let qty = rng.range(5, 20); + let id = Id::from_u64(next_id + i); + record(&mut hist, || { + let _ = book.submit_market_order_with_user(id, qty, Side::Buy, taker); + }); + } + + report(SCENARIO, &hist); + persist(SCENARIO, &hist).expect("persist hgrm"); +} diff --git a/benches/order_book/cancel_only_hdr.rs b/benches/order_book/cancel_only_hdr.rs new file mode 100644 index 0000000..09f5d1d --- /dev/null +++ b/benches/order_book/cancel_only_hdr.rs @@ -0,0 +1,36 @@ +// cancel_only_hdr — pre-loaded book + cancel workload. +// Measures `cancel_order` lookup + unlink cost. + +#[path = "hdr_common.rs"] +mod common; + +use common::{Rng, new_histogram, persist, record, report, submit_gtc}; +use pricelevel::Id; + +const SCENARIO: &str = "cancel_only"; +const PRELOAD_OPS: u64 = 1_000_000; +const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5; + +fn main() { + let book = common::fresh_book(); + let mut rng = Rng::new(SEED); + let mut hist = new_histogram(); + + // Pre-load the book with PRELOAD_OPS resting orders. The id space is + // 1..=PRELOAD_OPS so cancel ids are deterministic and present. + for i in 0..PRELOAD_OPS { + submit_gtc(&book, &mut rng, i + 1); + } + + // Cancel each one, in order. No warmup phase needed — cancel cost is + // dominated by `DashMap::remove` which has a stable distribution. + for i in 0..PRELOAD_OPS { + let id = Id::from_u64(i + 1); + record(&mut hist, || { + let _ = book.cancel_order(id); + }); + } + + report(SCENARIO, &hist); + persist(SCENARIO, &hist).expect("persist hgrm"); +} diff --git a/benches/order_book/hdr_common.rs b/benches/order_book/hdr_common.rs new file mode 100644 index 0000000..a6d190a --- /dev/null +++ b/benches/order_book/hdr_common.rs @@ -0,0 +1,151 @@ +// benches/order_book/hdr_common.rs +// +// Shared helpers for the `_hdr` bench binaries (issue #56). +// +// The Criterion benches coexist unchanged under the same directory. +// These helpers exist so each `_hdr` bench binary can record per-sample +// nanosecond latencies into an `hdrhistogram::Histogram` and emit a +// stable p50 / p99 / p99.9 / p99.99 + max table to stdout. + +#![allow(dead_code)] + +use hdrhistogram::Histogram; +use std::time::Instant; + +use orderbook_rs::OrderBook; +use pricelevel::{Hash32, Id, Side, TimeInForce}; + +/// Histogram sized for `1 ns .. 1 s` with three significant figures of +/// resolution. Three sig-figs is enough to distinguish p99 ≠ p99.9 when +/// they're an order of magnitude apart while staying memory-cheap. +pub fn new_histogram() -> Histogram { + Histogram::::new_with_bounds(1, 1_000_000_000, 3).expect("hist bounds") +} + +/// Record one closure invocation's wall-clock duration into `h`. +/// +/// Uses `std::hint::black_box` on the closure result to prevent +/// dead-code elimination of the observed work. +#[inline(always)] +pub fn record(h: &mut Histogram, f: F) -> R +where + F: FnOnce() -> R, +{ + let t0 = Instant::now(); + let r = std::hint::black_box(f()); + let elapsed = t0.elapsed().as_nanos() as u64; + // hdrhistogram refuses zero — clamp at 1ns. Non-issue for matching + // operations that always exceed a few hundred ns. + h.record(elapsed.max(1)).expect("record"); + r +} + +/// Print a fixed-format summary block to stdout. Matches what +/// `BENCH.md` quotes: scenario, sample count, p50/p99/p99.9/p99.99, +/// min, max — all in nanoseconds. +pub fn report(name: &str, h: &Histogram) { + println!("scenario : {name}"); + println!("samples : {}", h.len()); + println!("p50 (ns) : {}", h.value_at_quantile(0.50)); + println!("p99 (ns) : {}", h.value_at_quantile(0.99)); + println!("p99.9 (ns) : {}", h.value_at_quantile(0.999)); + println!("p99.99 (ns) : {}", h.value_at_quantile(0.9999)); + println!("min (ns) : {}", h.min()); + println!("max (ns) : {}", h.max()); +} + +/// Persist the raw histogram to `target/bench-hdr/.hgrm` (V2 +/// format) for downstream HDR plotters. `target/` is gitignored. +pub fn persist(name: &str, h: &Histogram) -> std::io::Result<()> { + use hdrhistogram::serialization::{Serializer, V2Serializer}; + std::fs::create_dir_all("target/bench-hdr")?; + let path = format!("target/bench-hdr/{name}.hgrm"); + let mut file = std::fs::File::create(&path)?; + V2Serializer::new() + .serialize(h, &mut file) + .map_err(|e| std::io::Error::other(e.to_string()))?; + eprintln!("wrote {path}"); + Ok(()) +} + +/// Tiny deterministic xorshift PRNG. Self-contained so no `rand` +/// dependency creeps into the dev-dep tree just for benches. +pub struct Rng(u64); + +impl Rng { + pub fn new(seed: u64) -> Self { + Self(seed.max(1)) + } + + #[inline] + pub fn next(&mut self) -> u64 { + let mut x = self.0; + x ^= x << 13; + x ^= x >> 7; + x ^= x << 17; + self.0 = x; + x + } + + #[inline] + pub fn range(&mut self, lo: u64, hi: u64) -> u64 { + debug_assert!(lo <= hi); + let span = hi - lo + 1; + lo + (self.next() % span) + } +} + +/// Common cross-bench constants. Tight price band forces frequent +/// crossings on the aggressive bench; the small owner pool keeps +/// per-account bookkeeping non-trivial without ballooning state. +pub const PRICE_LO: u64 = 99; +pub const PRICE_HI: u64 = 101; +pub const QTY_LO: u64 = 1; +pub const QTY_HI: u64 = 100; +pub const OWNERS: u8 = 4; + +pub fn owner(byte: u8) -> Hash32 { + let mut bytes = [0u8; 32]; + bytes[0] = byte; + Hash32::new(bytes) +} + +/// Produce a fresh `OrderBook` with no listeners and no risk gating — +/// the bench measures the engine itself, not the publisher pipeline. +pub fn fresh_book() -> OrderBook<()> { + OrderBook::<()>::new("BENCH") +} + +/// Side picker that yields `Buy` / `Sell` 50/50 from the rng. +#[inline] +pub fn pick_side(rng: &mut Rng) -> Side { + if rng.next().is_multiple_of(2) { + Side::Buy + } else { + Side::Sell + } +} + +/// Picker for owner ids: yields one of `[1, 2, 3, 4]` byte-tagged +/// `Hash32` accounts. +#[inline] +pub fn pick_owner(rng: &mut Rng) -> Hash32 { + owner(((rng.next() % OWNERS as u64) as u8) + 1) +} + +/// Common GTC submit shape used by `add_only`, `mixed`, and the seed +/// phases of the other scenarios. +#[inline] +pub fn submit_gtc(book: &OrderBook<()>, rng: &mut Rng, id: u64) { + let price = rng.range(PRICE_LO, PRICE_HI) as u128; + let qty = rng.range(QTY_LO, QTY_HI); + let _ = book.add_limit_order_with_user( + Id::from_u64(id), + price, + qty, + pick_side(rng), + TimeInForce::Gtc, + pick_owner(rng), + None, + ); +} diff --git a/benches/order_book/mass_cancel_burst_hdr.rs b/benches/order_book/mass_cancel_burst_hdr.rs new file mode 100644 index 0000000..fdc8b8a --- /dev/null +++ b/benches/order_book/mass_cancel_burst_hdr.rs @@ -0,0 +1,40 @@ +// mass_cancel_burst_hdr — dense book, then `cancel_all_orders` burst. +// Measures the bulk-cancel worst case as a single observation per cycle +// rather than per-order. + +#[path = "hdr_common.rs"] +mod common; + +use common::{Rng, new_histogram, persist, record, report, submit_gtc}; + +const SCENARIO: &str = "mass_cancel_burst"; +const ORDERS_PER_BURST: u64 = 10_000; +const MEASURED_BURSTS: u64 = 500; +const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5; + +fn main() { + let book = common::fresh_book(); + let mut rng = Rng::new(SEED); + let mut hist = new_histogram(); + let mut next_id: u64 = 1; + + for _ in 0..MEASURED_BURSTS { + // Re-load the book up to ORDERS_PER_BURST resting orders. Not + // measured. + for _ in 0..ORDERS_PER_BURST { + submit_gtc(&book, &mut rng, next_id); + next_id += 1; + } + + // The single-burst measurement: time `cancel_all_orders` end to + // end. The histogram entry is "ns to drain N orders", not per + // order — useful as an operator-side wall-clock guard rather + // than a per-op tail. + record(&mut hist, || { + let _ = book.cancel_all_orders(); + }); + } + + report(SCENARIO, &hist); + persist(SCENARIO, &hist).expect("persist hgrm"); +} diff --git a/benches/order_book/mixed_70_20_10_hdr.rs b/benches/order_book/mixed_70_20_10_hdr.rs new file mode 100644 index 0000000..f783baf --- /dev/null +++ b/benches/order_book/mixed_70_20_10_hdr.rs @@ -0,0 +1,83 @@ +// mixed_70_20_10_hdr — 70 % submits, 20 % cancels, 10 % aggressive. +// The "realistic" scenario the BENCH.md headline numbers come from. + +#[path = "hdr_common.rs"] +mod common; + +use common::{Rng, new_histogram, persist, pick_owner, pick_side, record, report}; +use pricelevel::{Id, Side, TimeInForce}; + +const SCENARIO: &str = "mixed_70_20_10"; +const WARMUP_OPS: u64 = 200_000; +const MEASURED_OPS: u64 = 1_000_000; +const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5; + +#[derive(Clone, Copy)] +enum Op { + Submit, + Cancel, + Aggressive, +} + +fn pick_op(rng: &mut Rng) -> Op { + match rng.next() % 100 { + 0..70 => Op::Submit, + 70..90 => Op::Cancel, + _ => Op::Aggressive, + } +} + +fn apply(book: &orderbook_rs::OrderBook<()>, rng: &mut Rng, next_id: &mut u64, op: Op) { + match op { + Op::Submit => { + let id = Id::from_u64(*next_id); + *next_id += 1; + let price = rng.range(common::PRICE_LO, common::PRICE_HI) as u128; + let qty = rng.range(common::QTY_LO, common::QTY_HI); + let _ = book.add_limit_order_with_user( + id, + price, + qty, + pick_side(rng), + TimeInForce::Gtc, + pick_owner(rng), + None, + ); + } + Op::Cancel => { + // Cancel a random previously-issued id. Some hit, some miss + // (already cancelled or filled) — both shapes are realistic. + if *next_id > 1 { + let target = rng.range(1, *next_id - 1); + let _ = book.cancel_order(Id::from_u64(target)); + } + } + Op::Aggressive => { + let id = Id::from_u64(*next_id); + *next_id += 1; + let qty = rng.range(1, 10); + let _ = book.submit_market_order_with_user(id, qty, pick_side(rng), pick_owner(rng)); + } + } +} + +fn main() { + let book = common::fresh_book(); + let mut rng = Rng::new(SEED); + let mut hist = new_histogram(); + let mut next_id: u64 = 1; + + for _ in 0..WARMUP_OPS { + let op = pick_op(&mut rng); + apply(&book, &mut rng, &mut next_id, op); + } + + for _ in 0..MEASURED_OPS { + let op = pick_op(&mut rng); + record(&mut hist, || apply(&book, &mut rng, &mut next_id, op)); + } + + report(SCENARIO, &hist); + persist(SCENARIO, &hist).expect("persist hgrm"); + let _ = Side::Buy; // keep `Side` import live across feature combos +} diff --git a/benches/order_book/thin_book_sweep_hdr.rs b/benches/order_book/thin_book_sweep_hdr.rs new file mode 100644 index 0000000..9652896 --- /dev/null +++ b/benches/order_book/thin_book_sweep_hdr.rs @@ -0,0 +1,58 @@ +// thin_book_sweep_hdr — book near-empty, IOC probing. +// Exercises the partial-fill / cancel-the-remainder path. + +#[path = "hdr_common.rs"] +mod common; + +use common::{Rng, new_histogram, owner, persist, record, report}; +use pricelevel::{Id, Side, TimeInForce}; + +const SCENARIO: &str = "thin_book_sweep"; +// Re-seed a thin slice (RESTING orders at one or two prices) every +// REFILL_EVERY ops so the book never goes fully empty across the +// measurement window. +const RESTING_PER_REFILL: u64 = 3; +const REFILL_EVERY: u64 = 5; +const MEASURED_OPS: u64 = 200_000; +const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5; + +fn main() { + let book = common::fresh_book(); + let mut rng = Rng::new(SEED); + let mut hist = new_histogram(); + let maker = owner(0xAA); + let taker = owner(0xBB); + let mut next_id: u64 = 1; + + for i in 0..MEASURED_OPS { + if i % REFILL_EVERY == 0 { + // Drop a few resting asks. No measurement around the + // refill — only the IOC probe is timed. + for _ in 0..RESTING_PER_REFILL { + let _ = book.add_limit_order_with_user( + Id::from_u64(next_id), + rng.range(99, 101) as u128, + rng.range(1, 5), + Side::Sell, + TimeInForce::Gtc, + maker, + None, + ); + next_id += 1; + } + } + + // IOC buy probe — frequently larger than the resting depth so + // the engine ends up partial-filling and cancelling the + // remainder. + let id = Id::from_u64(next_id); + next_id += 1; + let qty = rng.range(1, 20); + record(&mut hist, || { + let _ = book.submit_market_order_with_user(id, qty, Side::Buy, taker); + }); + } + + report(SCENARIO, &hist); + persist(SCENARIO, &hist).expect("persist hgrm"); +} diff --git a/src/lib.rs b/src/lib.rs index 53d3cdb..c61d0bc 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -34,6 +34,20 @@ //! //! ## What's New in Version 0.7.0 //! +//! ### v0.7.0 — HDR-histogram tail-latency bench suite +//! +//! - **Six new `*_hdr` bench binaries** under +//! `benches/order_book/`: `add_only`, `cancel_only`, +//! `aggressive_walk`, `mixed_70_20_10`, `thin_book_sweep`, +//! `mass_cancel_burst`. Each records per-sample nanosecond +//! latencies into an `hdrhistogram::Histogram` and emits +//! `p50` / `p99` / `p99.9` / `p99.99` + `min` / `max`. Coexists +//! with the existing Criterion benches. +//! - **`make bench-hdr`** convenience target. +//! - **Headline numbers + methodology** in `BENCH.md` at the repo +//! root, with a closed-loop disclosure block (the suite measures +//! service time, not load-induced tail). +//! //! ### v0.7.0 — Closed `RejectReason` enum //! //! - **New [`RejectReason`]** — closed