Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions BENCH.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,51 @@ that Criterion does well. The HDR benches are the source of truth for
the **tail** numbers (`p50` / `p99` / `p99.9` / `p99.99`) that tier-one
electronic exchanges quote in SLOs.

## Allocation profile (feature `alloc-counters`)

Under the `alloc-counters` feature the crate exposes a
`CountingAllocator<Inner: GlobalAlloc>` wrapper that tracks
`allocs` / `deallocs` / `bytes_allocated` / `bytes_deallocated` as
`AtomicU64` counters. Bench / test binaries opt in via:

```rust
use orderbook_rs::CountingAllocator;
use std::alloc::System;

#[global_allocator]
static A: CountingAllocator<System> = CountingAllocator::new(System);
```

`benches/order_book/alloc_count.rs` runs the same mixed 70 / 20 / 10
workload as `mixed_70_20_10_hdr` but reports `allocs_per_op` and
`bytes_alloc/op` over the measurement window (200 000 warmup +
1 000 000 measured). A reference run on the same M4 Max host:

| counter | value |
|----------------|---------------|
| allocs | 17 757 222 |
| deallocs | 17 690 635 |
| bytes_alloc | 4 926 064 834 |
| bytes_dealloc | 4 897 062 482 |
| **allocs/op** | **17.76** |
| bytes_alloc/op | 4 926 |

This is the headline number for "what does the matching engine cost
in alloc pressure on a realistic workload" — useful as a regression
signal much more than as an absolute target. The integration test
`tests/unit/alloc_budget_tests.rs` runs a smaller 10 000-op slice and
asserts `allocs/op < 10` to catch order-of-magnitude regressions in
CI.

Run yourself:

```bash
cargo bench --features alloc-counters --bench alloc_count
cargo test --features alloc-counters alloc_budget
```

Per-run summaries land in `target/alloc-counters/<scenario>.md`.

## How to run

```bash
Expand Down
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,42 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
> below group changes by feature; everything ships in the same
> 0.7.0 publish.

### Added — feature-gated allocation counter (#58)

- **New feature `alloc-counters`** (default off). Exposes
`CountingAllocator<Inner: GlobalAlloc>` and `AllocSnapshot` at the
crate root, layering four `AtomicU64` counters (`allocs`,
`deallocs`, `bytes_allocated`, `bytes_deallocated`) on top of any
inner allocator. Bench / test binaries opt in by installing the
wrapper as `#[global_allocator]`.
- **Bench `alloc_count`** at `benches/order_book/alloc_count.rs`
(also feature-gated) runs the mixed 70 / 20 / 10 workload, prints
`allocs_per_op` + `bytes_alloc/op` to stdout, and writes a small
markdown summary to `target/alloc-counters/<scenario>.md`.
- **Integration test `alloc_budget_tests`** at
`tests/unit/alloc_budget_tests.rs` runs 10 000 mixed ops and
asserts `allocs/op < 10` — conservative ceiling tuned to catch
order-of-magnitude regressions in CI, not to certify zero.
- **`BENCH.md`** gains an "Allocation profile" section with the
workflow + a reference number from a single M4 Max run.
- **`mod utils` made `pub mod utils`** so the new types are
reachable via `orderbook_rs::utils::CountingAllocator` as well as
the crate-root re-export. Existing `pub use utils::current_time_millis`
unchanged.

### Notes — alloc counter

- The library `rlib` does **not** install a `#[global_allocator]` —
consumers pick their own (`jemalloc`, `mimalloc`, system, …). The
wrapper exists to give bench / test binaries a measurement hook
without forcing a global choice on the library.
- `counting_allocator.rs` carries a documented
`#[allow(unsafe_code)]` exception to the crate's
`#![deny(unsafe_code)]` policy because Rust's `GlobalAlloc` trait
requires `unsafe impl`. The exception is gated on the feature flag
and confined to the wrapper module; every `unsafe` block
delegates immediately to the inner allocator.

### Added — HDR-histogram tail-latency bench suite (#56)

- **Six new bench binaries** under `benches/order_book/*_hdr.rs` that
Expand Down
12 changes: 12 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ special_orders = []
nats = ["dep:async-nats", "dep:bytes"]
bincode = ["dep:bincode"]
journal = ["dep:crc32fast", "dep:memmap2"]
alloc-counters = []

[dev-dependencies]
criterion = { version = "0.8", features = ["html_reports"] }
Expand Down Expand Up @@ -98,10 +99,21 @@ name = "mass_cancel_burst_hdr"
path = "benches/order_book/mass_cancel_burst_hdr.rs"
harness = false

[[bench]]
name = "alloc_count"
path = "benches/order_book/alloc_count.rs"
harness = false
required-features = ["alloc-counters"]

[[test]]
name = "tests"
path = "tests/unit/mod.rs"

[[test]]
name = "alloc_budget"
path = "tests/alloc_budget.rs"
required-features = ["alloc-counters"]


[lib]
name = "orderbook_rs"
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,22 @@ This order book engine is built with the following design principles:

### What's New in Version 0.7.0

#### v0.7.0 — Feature-gated allocation counter

- **New feature `alloc-counters`** (default off). Exposes
[`CountingAllocator`] and [`AllocSnapshot`] at the crate root.
Wraps any inner [`GlobalAlloc`](std::alloc::GlobalAlloc) and
tracks four `AtomicU64` counters: `allocs`, `deallocs`,
`bytes_allocated`, `bytes_deallocated`.
- Bench / test binaries opt in via
`#[global_allocator] static A: CountingAllocator<System> = ...`.
The library `rlib` does **not** install a global allocator.
- **`bench_count`** bench + **`alloc_budget_tests`** integration
test run the mixed 70/20/10 workload; the bench reports
`allocs_per_op`, the test asserts a conservative ceiling for
regression detection.
- **`BENCH.md`** gains an "Allocation profile" section.

#### v0.7.0 — HDR-histogram tail-latency bench suite

- **Six new `*_hdr` bench binaries** under
Expand Down
135 changes: 135 additions & 0 deletions benches/order_book/alloc_count.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
// alloc_count — feature-gated allocation profile of the mixed
// 70/20/10 hot-path workload. Reports `allocs_per_op` and a
// per-counter delta over a measurement window.
//
// Build / run:
//
// cargo bench --features alloc-counters --bench alloc_count

#![cfg(feature = "alloc-counters")]

#[path = "hdr_common.rs"]
mod common;

use orderbook_rs::utils::CountingAllocator;
use std::alloc::System;

#[global_allocator]
static GLOBAL: CountingAllocator<System> = CountingAllocator::new(System);

use common::{Rng, pick_owner, pick_side};
use pricelevel::{Id, TimeInForce};

const SCENARIO: &str = "alloc_count_mixed_70_20_10";
const WARMUP_OPS: u64 = 200_000;
const MEASURED_OPS: u64 = 1_000_000;
const SEED: u64 = 0xA5A5_A5A5_A5A5_A5A5;

#[derive(Clone, Copy)]
enum Op {
Submit,
Cancel,
Aggressive,
}

fn pick_op(rng: &mut Rng) -> Op {
let v = rng.next() % 100;
if v < 70 {
Op::Submit
} else if v < 90 {
Op::Cancel
} else {
Op::Aggressive
}
}

fn apply(book: &orderbook_rs::OrderBook<()>, rng: &mut Rng, next_id: &mut u64, op: Op) {
match op {
Op::Submit => {
let id = Id::from_u64(*next_id);
*next_id += 1;
let price = rng.range(common::PRICE_LO, common::PRICE_HI) as u128;
let qty = rng.range(common::QTY_LO, common::QTY_HI);
let _ = book.add_limit_order_with_user(
id,
price,
qty,
pick_side(rng),
TimeInForce::Gtc,
pick_owner(rng),
None,
);
}
Op::Cancel => {
if *next_id > 1 {
let target = rng.range(1, *next_id - 1);
let _ = book.cancel_order(Id::from_u64(target));
}
}
Op::Aggressive => {
let id = Id::from_u64(*next_id);
*next_id += 1;
let qty = rng.range(1, 10);
let _ = book.submit_market_order_with_user(id, qty, pick_side(rng), pick_owner(rng));
}
}
}

fn main() {
let book = common::fresh_book();
let mut rng = Rng::new(SEED);
let mut next_id: u64 = 1;

// Warmup — discarded.
for _ in 0..WARMUP_OPS {
let op = pick_op(&mut rng);
apply(&book, &mut rng, &mut next_id, op);
}

// Capture pre-measurement counters.
let before = GLOBAL.snapshot();

for _ in 0..MEASURED_OPS {
let op = pick_op(&mut rng);
apply(&book, &mut rng, &mut next_id, op);
}

let after = GLOBAL.snapshot();
let delta = after.since(before);

let allocs_per_op = delta.allocs as f64 / MEASURED_OPS as f64;
let bytes_per_op = delta.bytes_allocated as f64 / MEASURED_OPS as f64;

println!("scenario : {SCENARIO}");
println!("warmup ops : {WARMUP_OPS}");
println!("measured ops : {MEASURED_OPS}");
println!("allocs : {}", delta.allocs);
println!("deallocs : {}", delta.deallocs);
println!("bytes_alloc : {}", delta.bytes_allocated);
println!("bytes_dealloc : {}", delta.bytes_deallocated);
println!("allocs/op : {allocs_per_op:.4}");
println!("bytes_alloc/op : {bytes_per_op:.2}");

let summary = format!(
"# {SCENARIO}\n\
\n\
| counter | value |\n\
|-----------------|----------------------|\n\
| warmup_ops | {WARMUP_OPS} |\n\
| measured_ops | {MEASURED_OPS} |\n\
| allocs | {} |\n\
| deallocs | {} |\n\
| bytes_alloc | {} |\n\
| bytes_dealloc | {} |\n\
| allocs/op | {allocs_per_op:.4} |\n\
| bytes_alloc/op | {bytes_per_op:.2} |\n",
delta.allocs, delta.deallocs, delta.bytes_allocated, delta.bytes_deallocated,
);
let _ = std::fs::create_dir_all("target/alloc-counters");
let path = format!("target/alloc-counters/{SCENARIO}.md");
if let Err(e) = std::fs::write(&path, summary) {
eprintln!("could not write {path}: {e}");
} else {
eprintln!("wrote {path}");
}
}
11 changes: 7 additions & 4 deletions benches/order_book/mixed_70_20_10_hdr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,13 @@ enum Op {
}

fn pick_op(rng: &mut Rng) -> Op {
match rng.next() % 100 {
0..70 => Op::Submit,
70..90 => Op::Cancel,
_ => Op::Aggressive,
let v = rng.next() % 100;
if v < 70 {
Op::Submit
} else if v < 90 {
Op::Cancel
} else {
Op::Aggressive
}
}

Expand Down
20 changes: 19 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,22 @@
//!
//! ## What's New in Version 0.7.0
//!
//! ### v0.7.0 — Feature-gated allocation counter
//!
//! - **New feature `alloc-counters`** (default off). Exposes
//! [`CountingAllocator`] and [`AllocSnapshot`] at the crate root.
//! Wraps any inner [`GlobalAlloc`](std::alloc::GlobalAlloc) and
//! tracks four `AtomicU64` counters: `allocs`, `deallocs`,
//! `bytes_allocated`, `bytes_deallocated`.
//! - Bench / test binaries opt in via
//! `#[global_allocator] static A: CountingAllocator<System> = ...`.
//! The library `rlib` does **not** install a global allocator.
//! - **`bench_count`** bench + **`alloc_budget_tests`** integration
//! test run the mixed 70/20/10 workload; the bench reports
//! `allocs_per_op`, the test asserts a conservative ceiling for
//! regression detection.
//! - **`BENCH.md`** gains an "Allocation profile" section.
//!
//! ### v0.7.0 — HDR-histogram tail-latency bench suite
//!
//! - **Six new `*_hdr` bench binaries** under
Expand Down Expand Up @@ -395,7 +411,7 @@
pub mod orderbook;

pub mod prelude;
mod utils;
pub mod utils;

#[cfg(feature = "bincode")]
pub use orderbook::BincodeEventSerializer;
Expand Down Expand Up @@ -431,6 +447,8 @@ pub use orderbook::{
FeeSchedule, ManagerError, MassCancelResult, OrderBook, OrderBookError, OrderBookSnapshot,
};
pub use utils::current_time_millis;
#[cfg(feature = "alloc-counters")]
pub use utils::{AllocSnapshot, CountingAllocator};

/// Legacy type alias for `OrderBook<()>` to maintain backward compatibility.
///
Expand Down
Loading
Loading