diff --git a/BENCH.md b/BENCH.md
new file mode 100644
index 0000000..bc88217
--- /dev/null
+++ b/BENCH.md
@@ -0,0 +1,209 @@
+# Tail-latency benchmarks
+
+This document covers the **HDR-histogram** bench suite added in 0.7.0
+under `benches/order_book/*_hdr.rs`. The default Criterion benches in
+the same directory remain — they publish HTML reports to
+`target/criterion/` and report a mean-centric statistical comparison
+that Criterion does well. The HDR benches are the source of truth for
+the **tail** numbers (`p50` / `p99` / `p99.9` / `p99.99`) that tier-one
+electronic exchanges quote in SLOs.
+
+## How to run
+
+```bash
+make bench-hdr                 # all six scenarios
+cargo bench --bench mixed_70_20_10_hdr   # single scenario
+```
+
+Each bench writes its raw HDR histogram to
+`target/bench-hdr/<scenario>.hgrm` (V2 format) for downstream HDR
+plotters; the directory lives under `target/` and is gitignored.
+
+## Methodology
+
+- **Histogram resolution.** `Histogram::<u64>` sized for `1 ns` to `1 s`
+  with three significant figures. Three sig-figs is enough to
+  distinguish `p99 ≠ p99.9` an order of magnitude apart while staying
+  memory-cheap (~80 KB per histogram).
+- **Sample collection.** Each measured operation is wrapped in a closure
+  passed to `record(...)`, which times the closure with
+  `std::time::Instant::now()` (one call before, one after) and writes
+  the elapsed-nanosecond value into the histogram. The closure result
+  is consumed via `std::hint::black_box` to prevent dead-code
+  elimination.
+- **Warmup.** Long-running scenarios (`add_only`, `mixed_70_20_10`)
+  discard 200 000 ops before the measurement window starts.
+  Pre-loading scenarios (`cancel_only`, `aggressive_walk`,
+  `mass_cancel_burst`) seed the book in a non-measured loop instead.
+- **Workload determinism.** All scenarios drive a self-contained
+  xorshift PRNG seeded with `0xA5A5_A5A5_A5A5_A5A5`. Reproducing a run
+  with the same code produces the same op stream, modulo concurrent
+  scheduling jitter on the host.
+- **Coordinated omission.** The bench loop is **closed-loop**: the
+  driver waits for each engine call to return before issuing the next.
+  Closed-loop measurements **systematically under-report** tail
+  latencies that a real load generator would observe under saturation,
+  because queueing delays that would build up under a fixed arrival
+  rate never materialize. **The numbers below are pure service time —
+  use them as a regression signal and a lower bound on the production
+  tail, not as a production SLO.** Open-loop measurement (record
+  `now - scheduled_arrival`, not `now - call_start`) is the right
+  follow-up; tracked but not in the initial drop.
+- **CPU pinning.** Optional. On Linux, `taskset -c <core> cargo bench
+  --bench mixed_70_20_10_hdr` reduces variance from cross-core
+  scheduling. On macOS the benches were run without pinning — see the
+  run conditions block below.
+
+## Run conditions for the numbers below
+
+| Item | Value |
+|---|---|
+| Host | Apple M4 Max, macOS 25.4 (Darwin 25.4.0, `arm64`) |
+| Pinning | None |
+| Toolchain | `rustc 1.95.0` (stable) |
+| Profile | `--release` (Cargo `bench` profile = `release` clone) |
+| `RUSTFLAGS` | unset |
+| Allocator | system allocator |
+| Date | 2026-04-25 |
+| Crate version | `0.7.0-unreleased` (commit on `issue-56-hdr-bench`) |
+
+## Headline numbers
+
+All values in nanoseconds. **Closed-loop service time** — see
+"Coordinated omission" above.
+
+### `add_only` — pure passive limit submission, no crossings
+
+200 000 warmup + 1 000 000 measured.
+
+| Quantile | Latency (ns) |
+|---|---|
+| p50    | 791 |
+| p99    | 78 847 |
+| p99.9  | 146 303 |
+| p99.99 | 401 663 |
+| max    | 528 895 |
+
+**Where the tail comes from.** The book grows monotonically across the
+measurement window, so each insert must walk the `SkipMap` to the
+right level. The dominant contributor at p99.99 is allocator jitter
+when `Arc<PriceLevel>` allocations churn under the system allocator;
+secondary is L2 cache misses on the price-side `SkipMap` when the
+working set outgrows L1.
+
+### `cancel_only` — pre-loaded book, sequential cancels
+
+1 000 000 pre-loaded resting orders, all cancelled in order.
+
+| Quantile | Latency (ns) |
+|---|---|
+| p50    | 42 |
+| p99    | 25 167 |
+| p99.9  | 34 047 |
+| p99.99 | 172 031 |
+| max    | 1 271 807 |
+
+**Where the tail comes from.** `DashMap::remove` on the order index is
+a shard-local lock acquisition; the median is dominated by that
+single-cycle CAS path. The very long p99.99 / max tails reflect
+shard-contention windows when multiple removals land on the same
+shard back to back, plus rare allocator returns of large
+`PriceLevel` linked-list nodes.
+
+### `aggressive_walk` — taker market orders sweep multi-level book
+
+50 levels × 100 resting orders pre-loaded, then 100 000 aggressive
+buys with qty `5..=20`.
+
+| Quantile | Latency (ns) |
+|---|---|
+| p50    | 41 |
+| p99    | 7 083 |
+| p99.9  | 16 959 |
+| p99.99 | 33 823 |
+| max    | 203 263 |
+
+**Where the tail comes from.** The fill loop iterates per-order at
+each level until the requested quantity is consumed. Median is fast
+because most sweeps fill within a single level. Tail is driven by
+sweeps that span multiple levels and drop several `Arc<PriceLevel>`s
+at once.
+
+### `mixed_70_20_10` — 70 % submit, 20 % cancel, 10 % aggressive
+
+200 000 warmup + 1 000 000 measured. The "realistic" headline number.
+
+| Quantile | Latency (ns) |
+|---|---|
+| p50    | 667 |
+| p99    | 39 487 |
+| p99.9  | 71 999 |
+| p99.99 | 298 239 |
+| max    | 644 607 |
+
+**Where the tail comes from.** Mix of all three previous tails. The
+median tracks `add_only` (because submits are 70 % of the workload).
+The p99.99 comes from rare aggressive sweeps that interact with
+allocator returns released by recent cancels.
+
+### `thin_book_sweep` — book near-empty, IOC probing
+
+Refills 3 resting asks every 5 ops; 200 000 IOC buy probes with qty
+`1..=20`.
+
+| Quantile | Latency (ns) |
+|---|---|
+| p50    | 42 |
+| p99    | 5 711 |
+| p99.9  | 15 127 |
+| p99.99 | 50 431 |
+| max    | 418 303 |
+
+**Where the tail comes from.** Most probes either fully fill the
+small resting depth or partial-fill and short-circuit. The p99 is
+shaped by the partial-fill-then-cancel-remainder bookkeeping; the max
+is allocator jitter when the book transitions empty → non-empty.
+
+### `mass_cancel_burst` — dense book, then `cancel_all_orders`
+
+10 000 orders pre-loaded × 500 bursts. Each measured sample is
+**one full burst**, not one cancel — useful as an operator-side
+wall-clock guard rather than a per-op tail.
+
+| Quantile | Latency (ns) |
+|---|---|
+| p50    | 25 711 |
+| p99    | 48 447 |
+| p99.9  | 312 575 |
+| p99.99 | 312 575 |
+| max    | 312 575 |
+
+**Where the tail comes from.** Burst latency scales linearly with the
+book depth; on a tight host the median is ~26 µs to drain 10 000
+orders, ~0.5 ns per order amortised. The p99.9 / p99.99 / max all
+collapse to the same value because only 500 samples were taken — the
+single worst-case observation dominates.
+
+## Limitations
+
+- **macOS, no pinning.** The host above is a workstation, not a
+  performance-tuned bench rig. Tail numbers will be tighter on a
+  Linux host with `isolcpus=` + `nohz_full=` + a pinned thread, with
+  the system allocator swapped for `jemalloc` or `mimalloc`.
+- **Closed-loop only.** As called out under Methodology — these
+  numbers are pure service time, not load-induced tail. Open-loop
+  measurement is the next iteration of this suite.
+- **Single-threaded driver.** The benches issue one op at a time. A
+  multi-writer driver would surface `DashMap` shard contention more
+  visibly; deferred to a follow-up.
+
+## Reproducing
+
+```bash
+git checkout issue-56-hdr-bench  # or main once merged
+make bench-hdr
+cat target/bench-hdr/*.hgrm     # raw histograms
+```
+
+`hgrm` files are V2 format — readable by `HdrHistogram` plot tooling
+or convertible via `hdrhistogram`'s `Reader`.
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7cc49cd..0313335 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,6 +11,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 > below group changes by feature; everything ships in the same
 > 0.7.0 publish.
 
+### Added — HDR-histogram tail-latency bench suite (#56)
+
+- **Six new bench binaries** under `benches/order_book/*_hdr.rs` that
+  record per-sample latency into an `hdrhistogram::Histogram` and
+  emit `p50` / `p99` / `p99.9` / `p99.99` + min / max + sample count
+  to stdout. Scenarios: `add_only`, `cancel_only`,
+  `aggressive_walk`, `mixed_70_20_10`, `thin_book_sweep`,
+  `mass_cancel_burst`. Each is a `harness = false` binary that
+  coexists with the existing Criterion benches.
+- **Shared helpers** in `benches/order_book/hdr_common.rs`
+  (`new_histogram`, `record`, `report`, `persist`) and a
+  self-contained xorshift PRNG so the bench tree pulls no extra
+  runtime dependency beyond `hdrhistogram`.
+- **`hdrhistogram` ^7** as a dev-dependency.
+- **`make bench-hdr`** target — runs all six scenarios in series.
+- **`BENCH.md`** at repo root with methodology (warmup, closed-loop
+  vs open-loop disclosure), reproducibility steps, run conditions
+  block, and an honest table of the headline numbers from a single
+  M4 Max run plus a one-paragraph "where the tail comes from"
+  paragraph per scenario. Format-version stays at `2`.
+- Raw histograms persist to `target/bench-hdr/<scenario>.hgrm` (V2
+  HDR format, gitignored under `target/`).
+
+### Notes — HDR bench
+
+- **Closed-loop service time only.** The driver waits for each call
+  before issuing the next — tail latencies under saturation will be
+  worse than what these numbers report. Used as a regression signal
+  and a lower-bound on production tail, not as a published SLO.
+  Open-loop measurement is a follow-up.
+- The Criterion benches under `benches/order_book/` (`add_orders.rs`,
+  `match_orders.rs`, etc.) are unchanged.
+
 ### Added — closed `RejectReason` enum (#55)
 
 - **New `RejectReason`** closed `#[non_exhaustive] #[repr(u16)]` enum
diff --git a/Cargo.toml b/Cargo.toml
index 88af40c..0ad2112 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -61,12 +61,43 @@ criterion = { version = "0.8", features = ["html_reports"] }
 tokio = { version = "1.52", features = ["macros", "rt-multi-thread", "time"] }
 tempfile = "3"
 proptest = "1.7"
+hdrhistogram = "^7"
 
 [[bench]]
 name = "benches"
 path = "benches/mod.rs"
 harness = false
 
+[[bench]]
+name = "add_only_hdr"
+path = "benches/order_book/add_only_hdr.rs"
+harness = false
+
+[[bench]]
+name = "cancel_only_hdr"
+path = "benches/order_book/cancel_only_hdr.rs"
+harness = false
+
+[[bench]]
+name = "aggressive_walk_hdr"
+path = "benches/order_book/aggressive_walk_hdr.rs"
+harness = false
+
+[[bench]]
+name = "mixed_70_20_10_hdr"
+path = "benches/order_book/mixed_70_20_10_hdr.rs"
+harness = false
+
+[[bench]]
+name = "thin_book_sweep_hdr"
+path = "benches/order_book/thin_book_sweep_hdr.rs"
+harness = false
+
+[[bench]]
+name = "mass_cancel_burst_hdr"
+path = "benches/order_book/mass_cancel_burst_hdr.rs"
+harness = false
+
 [[test]]
 name = "tests"
 path = "tests/unit/mod.rs"
diff --git a/Makefile b/Makefile
index ae4f87d..203e4ac 100644
--- a/Makefile
+++ b/Makefile
@@ -167,6 +167,15 @@ bench-json: check-cargo-criterion
 bench-clean:
 	rm -rf target/criterion
 
+.PHONY: bench-hdr
+bench-hdr:
+	cargo bench --bench add_only_hdr
+	cargo bench --bench cancel_only_hdr
+	cargo bench --bench aggressive_walk_hdr
+	cargo bench --bench mixed_70_20_10_hdr
+	cargo bench --bench thin_book_sweep_hdr
+	cargo bench --bench mass_cancel_burst_hdr
+
 
 .PHONY: workflow-coverage
 workflow-coverage:
diff --git a/README.md b/README.md
index e93e860..00582b5 100644
--- a/README.md
+++ b/README.md
@@ -48,6 +48,20 @@ This order book engine is built with the following design principles:
 
 ### What's New in Version 0.7.0
 
+#### v0.7.0 — HDR-histogram tail-latency bench suite
+
+- **Six new `*_hdr` bench binaries** under
+  `benches/order_book/`: `add_only`, `cancel_only`,
+  `aggressive_walk`, `mixed_70_20_10`, `thin_book_sweep`,
+  `mass_cancel_burst`. Each records per-sample nanosecond
+  latencies into an `hdrhistogram::Histogram` and emits
+  `p50` / `p99` / `p99.9` / `p99.99` + `min` / `max`. Coexists
+  with the existing Criterion benches.
+- **`make bench-hdr`** convenience target.
+- **Headline numbers + methodology** in `BENCH.md` at the repo
+  root, with a closed-loop disclosure block (the suite measures
+  service time, not load-induced tail).
+
 #### v0.7.0 — Closed `RejectReason` enum
 
 - **New [`RejectReason`]** — closed
diff --git a/benches/order_book/add_only_hdr.rs b/benches/order_book/add_only_hdr.rs
new file mode 100644
index 0000000..bca79d3
--- /dev/null
+++ b/benches/order_book/add_only_hdr.rs
@@ -0,0 +1,33 @@
+// add_only_hdr — pure passive limit-order entry, no crossings.
+// Measures `add_order` insert cost in isolation.
+
+#[path = "hdr_common.rs"]
+mod common;
+
+use common::{Rng, new_histogram, persist, record, report, submit_gtc};
+
+const SCENARIO: &str = "add_only";
+const WARMUP_OPS: u64 = 200_000;
+const MEASURED_OPS: u64 = 1_000_000;
+const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;
+
+fn main() {
+    let book = common::fresh_book();
+    let mut rng = Rng::new(SEED);
+    let mut hist = new_histogram();
+
+    // Warmup — discarded.
+    for i in 0..WARMUP_OPS {
+        submit_gtc(&book, &mut rng, i);
+    }
+
+    // Measurement — id space picks up where warmup stopped to avoid
+    // collisions inside `order_locations`.
+    for i in 0..MEASURED_OPS {
+        let id = WARMUP_OPS + i;
+        record(&mut hist, || submit_gtc(&book, &mut rng, id));
+    }
+
+    report(SCENARIO, &hist);
+    persist(SCENARIO, &hist).expect("persist hgrm");
+}
diff --git a/benches/order_book/aggressive_walk_hdr.rs b/benches/order_book/aggressive_walk_hdr.rs
new file mode 100644
index 0000000..2e7c8cc
--- /dev/null
+++ b/benches/order_book/aggressive_walk_hdr.rs
@@ -0,0 +1,54 @@
+// aggressive_walk_hdr — taker market orders sweep multi-level book.
+// Measures the fill-loop tail under saturating liquidity.
+
+#[path = "hdr_common.rs"]
+mod common;
+
+use common::{Rng, new_histogram, owner, persist, record, report};
+use pricelevel::{Id, Side, TimeInForce};
+
+const SCENARIO: &str = "aggressive_walk";
+// Pre-load enough resting depth for every aggressive sweep to fill.
+const RESTING_PER_LEVEL: u64 = 100;
+const NUM_LEVELS: u64 = 50;
+const MEASURED_OPS: u64 = 100_000;
+const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;
+
+fn main() {
+    let book = common::fresh_book();
+    let mut rng = Rng::new(SEED);
+    let mut hist = new_histogram();
+    let maker = owner(0xAA);
+    let taker = owner(0xBB);
+
+    // Seed RESTING_PER_LEVEL asks at each of NUM_LEVELS prices.
+    let mut next_id = 1u64;
+    for level in 0..NUM_LEVELS {
+        let price = (100 + level) as u128;
+        for _ in 0..RESTING_PER_LEVEL {
+            let _ = book.add_limit_order_with_user(
+                Id::from_u64(next_id),
+                price,
+                rng.range(1, 10),
+                Side::Sell,
+                TimeInForce::Gtc,
+                maker,
+                None,
+            );
+            next_id += 1;
+        }
+    }
+
+    // Aggressive Buy sweeps. Each sweeps 5..=20 lots — usually clears
+    // a few orders within the same price level.
+    for i in 0..MEASURED_OPS {
+        let qty = rng.range(5, 20);
+        let id = Id::from_u64(next_id + i);
+        record(&mut hist, || {
+            let _ = book.submit_market_order_with_user(id, qty, Side::Buy, taker);
+        });
+    }
+
+    report(SCENARIO, &hist);
+    persist(SCENARIO, &hist).expect("persist hgrm");
+}
diff --git a/benches/order_book/cancel_only_hdr.rs b/benches/order_book/cancel_only_hdr.rs
new file mode 100644
index 0000000..09f5d1d
--- /dev/null
+++ b/benches/order_book/cancel_only_hdr.rs
@@ -0,0 +1,36 @@
+// cancel_only_hdr — pre-loaded book + cancel workload.
+// Measures `cancel_order` lookup + unlink cost.
+
+#[path = "hdr_common.rs"]
+mod common;
+
+use common::{Rng, new_histogram, persist, record, report, submit_gtc};
+use pricelevel::Id;
+
+const SCENARIO: &str = "cancel_only";
+const PRELOAD_OPS: u64 = 1_000_000;
+const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;
+
+fn main() {
+    let book = common::fresh_book();
+    let mut rng = Rng::new(SEED);
+    let mut hist = new_histogram();
+
+    // Pre-load the book with PRELOAD_OPS resting orders. The id space is
+    // 1..=PRELOAD_OPS so cancel ids are deterministic and present.
+    for i in 0..PRELOAD_OPS {
+        submit_gtc(&book, &mut rng, i + 1);
+    }
+
+    // Cancel each one, in order. No warmup phase needed — cancel cost is
+    // dominated by `DashMap::remove` which has a stable distribution.
+    for i in 0..PRELOAD_OPS {
+        let id = Id::from_u64(i + 1);
+        record(&mut hist, || {
+            let _ = book.cancel_order(id);
+        });
+    }
+
+    report(SCENARIO, &hist);
+    persist(SCENARIO, &hist).expect("persist hgrm");
+}
diff --git a/benches/order_book/hdr_common.rs b/benches/order_book/hdr_common.rs
new file mode 100644
index 0000000..a6d190a
--- /dev/null
+++ b/benches/order_book/hdr_common.rs
@@ -0,0 +1,151 @@
+// benches/order_book/hdr_common.rs
+//
+// Shared helpers for the `_hdr` bench binaries (issue #56).
+//
+// The Criterion benches coexist unchanged under the same directory.
+// These helpers exist so each `_hdr` bench binary can record per-sample
+// nanosecond latencies into an `hdrhistogram::Histogram` and emit a
+// stable p50 / p99 / p99.9 / p99.99 + max table to stdout.
+
+#![allow(dead_code)]
+
+use hdrhistogram::Histogram;
+use std::time::Instant;
+
+use orderbook_rs::OrderBook;
+use pricelevel::{Hash32, Id, Side, TimeInForce};
+
+/// Histogram sized for `1 ns .. 1 s` with three significant figures of
+/// resolution. Three sig-figs is enough to distinguish p99 ≠ p99.9 when
+/// they're an order of magnitude apart while staying memory-cheap.
+pub fn new_histogram() -> Histogram<u64> {
+    Histogram::<u64>::new_with_bounds(1, 1_000_000_000, 3).expect("hist bounds")
+}
+
+/// Record one closure invocation's wall-clock duration into `h`.
+///
+/// Uses `std::hint::black_box` on the closure result to prevent
+/// dead-code elimination of the observed work.
+#[inline(always)]
+pub fn record<F, R>(h: &mut Histogram<u64>, f: F) -> R
+where
+    F: FnOnce() -> R,
+{
+    let t0 = Instant::now();
+    let r = std::hint::black_box(f());
+    let elapsed = t0.elapsed().as_nanos() as u64;
+    // hdrhistogram refuses zero — clamp at 1ns. Non-issue for matching
+    // operations that always exceed a few hundred ns.
+    h.record(elapsed.max(1)).expect("record");
+    r
+}
+
+/// Print a fixed-format summary block to stdout. Matches what
+/// `BENCH.md` quotes: scenario, sample count, p50/p99/p99.9/p99.99,
+/// min, max — all in nanoseconds.
+pub fn report(name: &str, h: &Histogram<u64>) {
+    println!("scenario     : {name}");
+    println!("samples      : {}", h.len());
+    println!("p50    (ns)  : {}", h.value_at_quantile(0.50));
+    println!("p99    (ns)  : {}", h.value_at_quantile(0.99));
+    println!("p99.9  (ns)  : {}", h.value_at_quantile(0.999));
+    println!("p99.99 (ns)  : {}", h.value_at_quantile(0.9999));
+    println!("min    (ns)  : {}", h.min());
+    println!("max    (ns)  : {}", h.max());
+}
+
+/// Persist the raw histogram to `target/bench-hdr/<name>.hgrm` (V2
+/// format) for downstream HDR plotters. `target/` is gitignored.
+pub fn persist(name: &str, h: &Histogram<u64>) -> std::io::Result<()> {
+    use hdrhistogram::serialization::{Serializer, V2Serializer};
+    std::fs::create_dir_all("target/bench-hdr")?;
+    let path = format!("target/bench-hdr/{name}.hgrm");
+    let mut file = std::fs::File::create(&path)?;
+    V2Serializer::new()
+        .serialize(h, &mut file)
+        .map_err(|e| std::io::Error::other(e.to_string()))?;
+    eprintln!("wrote {path}");
+    Ok(())
+}
+
+/// Tiny deterministic xorshift PRNG. Self-contained so no `rand`
+/// dependency creeps into the dev-dep tree just for benches.
+pub struct Rng(u64);
+
+impl Rng {
+    pub fn new(seed: u64) -> Self {
+        Self(seed.max(1))
+    }
+
+    #[inline]
+    pub fn next(&mut self) -> u64 {
+        let mut x = self.0;
+        x ^= x << 13;
+        x ^= x >> 7;
+        x ^= x << 17;
+        self.0 = x;
+        x
+    }
+
+    #[inline]
+    pub fn range(&mut self, lo: u64, hi: u64) -> u64 {
+        debug_assert!(lo <= hi);
+        let span = hi - lo + 1;
+        lo + (self.next() % span)
+    }
+}
+
+/// Common cross-bench constants. Tight price band forces frequent
+/// crossings on the aggressive bench; the small owner pool keeps
+/// per-account bookkeeping non-trivial without ballooning state.
+pub const PRICE_LO: u64 = 99;
+pub const PRICE_HI: u64 = 101;
+pub const QTY_LO: u64 = 1;
+pub const QTY_HI: u64 = 100;
+pub const OWNERS: u8 = 4;
+
+pub fn owner(byte: u8) -> Hash32 {
+    let mut bytes = [0u8; 32];
+    bytes[0] = byte;
+    Hash32::new(bytes)
+}
+
+/// Produce a fresh `OrderBook` with no listeners and no risk gating —
+/// the bench measures the engine itself, not the publisher pipeline.
+pub fn fresh_book() -> OrderBook<()> {
+    OrderBook::<()>::new("BENCH")
+}
+
+/// Side picker that yields `Buy` / `Sell` 50/50 from the rng.
+#[inline]
+pub fn pick_side(rng: &mut Rng) -> Side {
+    if rng.next().is_multiple_of(2) {
+        Side::Buy
+    } else {
+        Side::Sell
+    }
+}
+
+/// Picker for owner ids: yields one of `[1, 2, 3, 4]` byte-tagged
+/// `Hash32` accounts.
+#[inline]
+pub fn pick_owner(rng: &mut Rng) -> Hash32 {
+    owner(((rng.next() % OWNERS as u64) as u8) + 1)
+}
+
+/// Common GTC submit shape used by `add_only`, `mixed`, and the seed
+/// phases of the other scenarios.
+#[inline]
+pub fn submit_gtc(book: &OrderBook<()>, rng: &mut Rng, id: u64) {
+    let price = rng.range(PRICE_LO, PRICE_HI) as u128;
+    let qty = rng.range(QTY_LO, QTY_HI);
+    let _ = book.add_limit_order_with_user(
+        Id::from_u64(id),
+        price,
+        qty,
+        pick_side(rng),
+        TimeInForce::Gtc,
+        pick_owner(rng),
+        None,
+    );
+}
diff --git a/benches/order_book/mass_cancel_burst_hdr.rs b/benches/order_book/mass_cancel_burst_hdr.rs
new file mode 100644
index 0000000..fdc8b8a
--- /dev/null
+++ b/benches/order_book/mass_cancel_burst_hdr.rs
@@ -0,0 +1,40 @@
+// mass_cancel_burst_hdr — dense book, then `cancel_all_orders` burst.
+// Measures the bulk-cancel worst case as a single observation per cycle
+// rather than per-order.
+
+#[path = "hdr_common.rs"]
+mod common;
+
+use common::{Rng, new_histogram, persist, record, report, submit_gtc};
+
+const SCENARIO: &str = "mass_cancel_burst";
+const ORDERS_PER_BURST: u64 = 10_000;
+const MEASURED_BURSTS: u64 = 500;
+const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;
+
+fn main() {
+    let book = common::fresh_book();
+    let mut rng = Rng::new(SEED);
+    let mut hist = new_histogram();
+    let mut next_id: u64 = 1;
+
+    for _ in 0..MEASURED_BURSTS {
+        // Re-load the book up to ORDERS_PER_BURST resting orders. Not
+        // measured.
+        for _ in 0..ORDERS_PER_BURST {
+            submit_gtc(&book, &mut rng, next_id);
+            next_id += 1;
+        }
+
+        // The single-burst measurement: time `cancel_all_orders` end to
+        // end. The histogram entry is "ns to drain N orders", not per
+        // order — useful as an operator-side wall-clock guard rather
+        // than a per-op tail.
+        record(&mut hist, || {
+            let _ = book.cancel_all_orders();
+        });
+    }
+
+    report(SCENARIO, &hist);
+    persist(SCENARIO, &hist).expect("persist hgrm");
+}
diff --git a/benches/order_book/mixed_70_20_10_hdr.rs b/benches/order_book/mixed_70_20_10_hdr.rs
new file mode 100644
index 0000000..f783baf
--- /dev/null
+++ b/benches/order_book/mixed_70_20_10_hdr.rs
@@ -0,0 +1,83 @@
+// mixed_70_20_10_hdr — 70 % submits, 20 % cancels, 10 % aggressive.
+// The "realistic" scenario the BENCH.md headline numbers come from.
+
+#[path = "hdr_common.rs"]
+mod common;
+
+use common::{Rng, new_histogram, persist, pick_owner, pick_side, record, report};
+use pricelevel::{Id, Side, TimeInForce};
+
+const SCENARIO: &str = "mixed_70_20_10";
+const WARMUP_OPS: u64 = 200_000;
+const MEASURED_OPS: u64 = 1_000_000;
+const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;
+
+#[derive(Clone, Copy)]
+enum Op {
+    Submit,
+    Cancel,
+    Aggressive,
+}
+
+fn pick_op(rng: &mut Rng) -> Op {
+    match rng.next() % 100 {
+        0..70 => Op::Submit,
+        70..90 => Op::Cancel,
+        _ => Op::Aggressive,
+    }
+}
+
+fn apply(book: &orderbook_rs::OrderBook<()>, rng: &mut Rng, next_id: &mut u64, op: Op) {
+    match op {
+        Op::Submit => {
+            let id = Id::from_u64(*next_id);
+            *next_id += 1;
+            let price = rng.range(common::PRICE_LO, common::PRICE_HI) as u128;
+            let qty = rng.range(common::QTY_LO, common::QTY_HI);
+            let _ = book.add_limit_order_with_user(
+                id,
+                price,
+                qty,
+                pick_side(rng),
+                TimeInForce::Gtc,
+                pick_owner(rng),
+                None,
+            );
+        }
+        Op::Cancel => {
+            // Cancel a random previously-issued id. Some hit, some miss
+            // (already cancelled or filled) — both shapes are realistic.
+            if *next_id > 1 {
+                let target = rng.range(1, *next_id - 1);
+                let _ = book.cancel_order(Id::from_u64(target));
+            }
+        }
+        Op::Aggressive => {
+            let id = Id::from_u64(*next_id);
+            *next_id += 1;
+            let qty = rng.range(1, 10);
+            let _ = book.submit_market_order_with_user(id, qty, pick_side(rng), pick_owner(rng));
+        }
+    }
+}
+
+fn main() {
+    let book = common::fresh_book();
+    let mut rng = Rng::new(SEED);
+    let mut hist = new_histogram();
+    let mut next_id: u64 = 1;
+
+    for _ in 0..WARMUP_OPS {
+        let op = pick_op(&mut rng);
+        apply(&book, &mut rng, &mut next_id, op);
+    }
+
+    for _ in 0..MEASURED_OPS {
+        let op = pick_op(&mut rng);
+        record(&mut hist, || apply(&book, &mut rng, &mut next_id, op));
+    }
+
+    report(SCENARIO, &hist);
+    persist(SCENARIO, &hist).expect("persist hgrm");
+    let _ = Side::Buy; // keep `Side` import live across feature combos
+}
diff --git a/benches/order_book/thin_book_sweep_hdr.rs b/benches/order_book/thin_book_sweep_hdr.rs
new file mode 100644
index 0000000..9652896
--- /dev/null
+++ b/benches/order_book/thin_book_sweep_hdr.rs
@@ -0,0 +1,58 @@
+// thin_book_sweep_hdr — book near-empty, IOC probing.
+// Exercises the partial-fill / cancel-the-remainder path.
+
+#[path = "hdr_common.rs"]
+mod common;
+
+use common::{Rng, new_histogram, owner, persist, record, report};
+use pricelevel::{Id, Side, TimeInForce};
+
+const SCENARIO: &str = "thin_book_sweep";
+// Re-seed a thin slice (RESTING orders at one or two prices) every
+// REFILL_EVERY ops so the book never goes fully empty across the
+// measurement window.
+const RESTING_PER_REFILL: u64 = 3;
+const REFILL_EVERY: u64 = 5;
+const MEASURED_OPS: u64 = 200_000;
+const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;
+
+fn main() {
+    let book = common::fresh_book();
+    let mut rng = Rng::new(SEED);
+    let mut hist = new_histogram();
+    let maker = owner(0xAA);
+    let taker = owner(0xBB);
+    let mut next_id: u64 = 1;
+
+    for i in 0..MEASURED_OPS {
+        if i % REFILL_EVERY == 0 {
+            // Drop a few resting asks. No measurement around the
+            // refill — only the IOC probe is timed.
+            for _ in 0..RESTING_PER_REFILL {
+                let _ = book.add_limit_order_with_user(
+                    Id::from_u64(next_id),
+                    rng.range(99, 101) as u128,
+                    rng.range(1, 5),
+                    Side::Sell,
+                    TimeInForce::Gtc,
+                    maker,
+                    None,
+                );
+                next_id += 1;
+            }
+        }
+
+        // IOC buy probe — frequently larger than the resting depth so
+        // the engine ends up partial-filling and cancelling the
+        // remainder.
+        let id = Id::from_u64(next_id);
+        next_id += 1;
+        let qty = rng.range(1, 20);
+        record(&mut hist, || {
+            let _ = book.submit_market_order_with_user(id, qty, Side::Buy, taker);
+        });
+    }
+
+    report(SCENARIO, &hist);
+    persist(SCENARIO, &hist).expect("persist hgrm");
+}
diff --git a/src/lib.rs b/src/lib.rs
index 53d3cdb..c61d0bc 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -34,6 +34,20 @@
 //!
 //! ## What's New in Version 0.7.0
 //!
+//! ### v0.7.0 — HDR-histogram tail-latency bench suite
+//!
+//! - **Six new `*_hdr` bench binaries** under
+//!   `benches/order_book/`: `add_only`, `cancel_only`,
+//!   `aggressive_walk`, `mixed_70_20_10`, `thin_book_sweep`,
+//!   `mass_cancel_burst`. Each records per-sample nanosecond
+//!   latencies into an `hdrhistogram::Histogram` and emits
+//!   `p50` / `p99` / `p99.9` / `p99.99` + `min` / `max`. Coexists
+//!   with the existing Criterion benches.
+//! - **`make bench-hdr`** convenience target.
+//! - **Headline numbers + methodology** in `BENCH.md` at the repo
+//!   root, with a closed-loop disclosure block (the suite measures
+//!   service time, not load-induced tail).
+//!
 //! ### v0.7.0 — Closed `RejectReason` enum
 //!
 //! - **New [`RejectReason`]** — closed