Limit Order Book

A high-performance limit order book implementation in modern C++17.

Building

make        # Build the demo
make test   # Run tests
make benchmark  # Build benchmark suite
make clean  # Clean build artifacts

Iteration 1.3.0 (Latest)

Key Changes

Zero heap allocations on the hot path: All Order and PriceLevel objects served from pre-allocated object pools. Fill buffer reused across match_order calls — no std::vector<Fill> allocation per match. Deterministic mode (-DLOB_DETERMINISTIC_POOL) enforces zero-allocation at compile time.
Templatized matching on Side: match_order, add_order_to_book, and remove_order_from_book are template<Side S> — the compiler generates separate BUY/SELL code paths, eliminating runtime branch in the inner matching loop.
Branch hints: LOB_LIKELY/LOB_UNLIKELY annotations on all hot-path branches (pool allocation, price range checks, matching loop predicates).
Order struct layout: Side narrowed to uint8_t, fields reordered for zero internal padding. Hot fields (id, price, quantity, pointers) packed into first 48 bytes (single cache line).
ITCH 5.0 parser: Zero-copy binary protocol parser for Add Order (A/F), Order Executed (E), Order Cancel (X), Order Delete (D), and Order Replace (U) messages.
Realistic workload benchmark: 93% cancel, 5% add, 2% modify — matching real exchange traffic patterns where the vast majority of orders are cancelled before execution.
Cycle counter: Cross-platform rdtsc (x86) / CNTVCT_EL0 (ARM64) cycle timer for sub-nanosecond measurement resolution.

Architecture

Each shard runs on a dedicated pinned CPU core (pthread_setaffinity_np on Linux, thread_policy_set on macOS), eliminating context-switch jitter and core migration.
Single-threaded per shard — no locks, no atomics on the hot path.
SPSC queues between producer and consumer threads.

Performance Summary

AddOrder 26 ns | MatchOrder 47 ns | CancelHeavy 27 Mops/s

Benchmark	Mean	Throughput	vs 1.2.0
AddOrder	26 ns	38.9 Mops/s	1.1x faster (was 29 ns)
MatchOrder	47 ns	21.4 Mops/s	comparable
CancelOrder	16 ns	63.6 Mops/s	comparable
MixedWorkload	27 ns	21.6 Mops/s	comparable
CancelHeavyWorkload	19 ns	27.0 Mops/s	NEW

Tail Latency

Operation	P99	P99.9	P99.99	vs 1.2.0
AddOrder	42 ns	250 ns	1.8 us	P99.9 4x better (was 1 us)
MatchOrder	84 ns	167 ns	250 ns	P99.99 1.3x better (was 333 ns)
CancelHeavyWorkload	83 ns	666 ns	833 ns	NEW
MixedWorkload	167 ns	250 ns	500 ns	P99.99 7x better (was 3.5 us)

See results/benchmark_result_1.3.0.csv for full data.

Iteration 1.2.0

Key Changes

Tick-indexed ladders: Contiguous arrays replace std::map + std::unordered_map for price levels. Active levels tracked via bitset with __builtin_ctzll/__builtin_clzll scanning.
Object pool with growth control: Pre-allocated pools for Order and PriceLevel. Deterministic mode (-DLOB_DETERMINISTIC_POOL) prevents runtime heap allocation.
Sharded engine: SPSC queue per shard, strict single-producer ownership, batched consumer processing, thread pinning.

Performance Summary

AddOrder 28 ns | MatchOrder 46 ns | MixedWorkload 22.8 Mops/s

Benchmark	Mean	Throughput	vs 1.1.0
AddOrder	28 ns	35.0 Mops/s	1.9x faster (was 54 ns)
MatchOrder	46 ns	22.0 Mops/s	2.6x faster (was 118 ns)
MixedWorkload	28 ns	21.2 Mops/s	1.6x faster (was 46 ns)
CancelOrder	15 ns	65.3 Mops/s	comparable
ModifyOrder	16 ns	61.3 Mops/s	comparable

Tail Latency vs 1.1.0

Operation	P99	P99.99	vs 1.1.0
AddOrder	42 ns	1.6 us	P99.99 7x better (was 11 us)
MatchOrder	84 ns	333 ns	P99.99 1.9x better (was 625 ns)
MixedWorkload	125 ns	3.5 us	P99 2.7x better (was 333 ns)

Sharded Throughput (Synthetic Add Flow)

Each row shows shards / consumer batch size. Producer submits 100k orders; workers process in parallel.

Shards / Batch	Throughput
1 / 64	9.69 Mops/s
2 / 64	9.40 Mops/s
4 / 64	10.55 Mops/s
8 / 64	8.69 Mops/s
1 / 256	9.61 Mops/s
2 / 256	13.40 Mops/s
4 / 256	10.60 Mops/s
8 / 256	8.74 Mops/s

Sharded End-to-End Latency

Submit-to-completion latency per order, including queue transit and matching.

Shards / Batch	Mean	P95	P99	P99.9
1 / 256	348 ns	417 ns	875 ns	14.1 us
2 / 256	529 ns	958 ns	3.7 us	9.1 us
4 / 256	1.6 us	2.3 us	13.8 us	35.3 us

See results/benchmark_result_1.2.0.csv for full data.

Iteration 1.1.0

Implemented optimizations based on "How to Build a Fast Limit Order Book" article.

Key Changes

Integer Prices: Changed from double to int64_t (ticks) for faster comparisons and better hashing
Intrusive Doubly-Linked Lists: Orders contain prev/next pointers for O(1) removal
Hash Maps: std::unordered_map for O(1) price level and order lookup
Binary Search Tree: Price levels organized in BST for O(log M) insertion of new levels
Cached Best Bid/Ask: Direct pointer access (highest_buy_/lowest_sell_) for O(1) best price queries

Performance Summary

AddOrder 47 ns | MatchOrder 117 ns | MixedWorkload 16.6 Mops/s

Significant Improvements vs 1.0.0

Operation	Mean	Improvement
AddOrder	47 ns	2.3x faster (was 109 ns)
CancelOrder	16 ns	1.4x faster (was 23 ns)
ModifyOrder	15 ns	1.6x faster (was 24 ns)
MatchOrder	117 ns	1.4x faster (was 160 ns)
MixedWorkload	42 ns	1.6x faster (was 69 ns)

Tail Latency Improvements

Operation	P99	P99.99	vs 1.0.0
AddOrder	84 ns	1.8 µs	P99 9x better (was 792 ns), P99.99 19x better (was 34 µs)
MatchOrder	209 ns	334 ns	P99.99 69x better (was 23 µs)
MixedWorkload	292 ns	708 ns	P99.99 28x better (was 20 µs)

Time Complexity

Operation	Complexity	Notes
`add_order` (passive)	O(1) / O(log M)	O(1) at existing limit, O(log M) for new price level
`add_order` (aggressive)	O(F + L log M)	Includes matching; see below
`cancel_order`	O(1)	Hash lookup + intrusive list removal
`modify_order`	O(1)	Hash lookup + quantity update
`get_best_bid`	O(1)	Cached pointer
`get_best_ask`	O(1)	Cached pointer
`get_snapshot`	O(D)	D = depth requested

Matching complexity: O(F + L log M)

F = number of fills (resting orders matched)
L = number of price levels fully exhausted (L ≤ F)
M = total price levels on the opposing side

Scenario	Complexity	Example
Passive (no cross)	O(1)	Bid below best ask
Single partial fill	O(1)	Match one order, level remains
Single level exhausted	O(log M)	Clear one price level
Multi-level sweep	O(F + L log M)	Large aggressive order

Iteration 1.0.0

Price-Time Priority: Orders matched by best price first, then arrival time (FIFO)
Efficient Data Structures: std::map for price levels, std::list for order queues, std::unordered_map for O(1) order lookup
Clean API: Structured results with I/O separated from business logic
Modern C++17: std::optional, [[nodiscard]], namespaces, const-correctness

Performance Summary

AddOrder 109 ns | MatchOrder 160 ns | MixedWorkload 11 Mops/s

Benchmark	Mean	P99	P99.99
GetBestBid	19 ns	42 ns	1.4 µs
GetBestAsk	17 ns	42 ns	166 ns
GetSpread	18 ns	42 ns	167 ns
CancelOrder	23 ns	125 ns	459 ns
ModifyOrder	24 ns	42 ns	167 ns
AddOrder	109 ns	792 ns	34 µs
MatchOrder	160 ns	292 ns	23 µs
MixedWorkload	69 ns	459 ns	20 µs

See benchmark/README.md for methodology.

Potential Improvements

Memory pool allocators
Market orders, IOC, FOK order types
Thread safety / lock-free structures
SIMD optimizations for batch operations

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
benchmark		benchmark
examples		examples
include/lob		include/lob
results		results
src		src
tests		tests
.clangd		.clangd
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compile_flags.txt		compile_flags.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Limit Order Book

Building

Iteration 1.3.0 (Latest)

Key Changes

Architecture

Performance Summary

Tail Latency

Iteration 1.2.0

Key Changes

Performance Summary

Tail Latency vs 1.1.0

Sharded Throughput (Synthetic Add Flow)

Sharded End-to-End Latency

Iteration 1.1.0

Key Changes

Performance Summary

Significant Improvements vs 1.0.0

Tail Latency Improvements

Time Complexity

Iteration 1.0.0

Performance Summary

Potential Improvements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Limit Order Book

Building

Iteration 1.3.0 (Latest)

Key Changes

Architecture

Performance Summary

Tail Latency

Iteration 1.2.0

Key Changes

Performance Summary

Tail Latency vs 1.1.0

Sharded Throughput (Synthetic Add Flow)

Sharded End-to-End Latency

Iteration 1.1.0

Key Changes

Performance Summary

Significant Improvements vs 1.0.0

Tail Latency Improvements

Time Complexity

Iteration 1.0.0

Performance Summary

Potential Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages