A performance-critical trading system implementation demonstrating expertise in low-latency C++ development, systems programming, and financial infrastructure.
This project implements a complete high-frequency trading (HFT) system stack including:
- Matching Engine: Price-time priority order matching with sub-microsecond latency
- Order Book: Lock-free, cache-optimized limit order book
- Market Data Handler: Real-time market data processing with FIX and WebSocket support
- Order Gateway: REST API with rate limiting and risk management
- Performance Benchmarks: Comprehensive latency and throughput measurement tools
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Order Gateway β
β (Rate Limiting, Risk) β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Matching Engine β
β (Multi-instrument, Price-Time Priority) β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββββββββββ€
β BTC-USD Book β ETH-USD Book β ... more books β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Market Data Publisher β
β (WebSocket, FIX, Execution Reports) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Metric | Performance |
|---|---|---|
| Order Book Add | Latency | < 500 ns |
| Order Book Cancel | Latency | < 200 ns |
| Order Match | Latency | < 1 ΞΌs |
| SPSC Queue | Throughput | > 50M ops/sec |
| Memory Pool | Allocation | < 50 ns |
- Lock-free Data Structures: SPSC/MPSC queues with cache-line padding
- Custom Memory Pool: O(1) allocation without heap fragmentation
- Cache-Aware Design: 64-byte aligned structures for cache efficiency
- CPU Affinity: Thread pinning for deterministic performance
- RDTSC Timing: Nanosecond-precision measurement with minimal overhead
- Concepts and constraints
std::spanfor zero-copy viewsconstexprcompile-time computation- Structured bindings
std::optionalandstd::string_view
- FIX 4.4: Message parsing and generation
- WebSocket: Real-time market data streaming
- REST API: Order submission and management
.
βββ CMakeLists.txt # Build configuration
βββ Dockerfile # Development container
βββ docker-compose.yml # Multi-service orchestration
βββ src/
β βββ core/ # Low-latency utilities
β β βββ lockfree_queue.hpp
β β βββ memory_pool.hpp
β β βββ spinlock.hpp
β β βββ timing.hpp
β β βββ cpu_affinity.hpp
β β βββ types.hpp
β βββ matching/ # Matching engine
β β βββ order.hpp
β β βββ price_level.hpp
β β βββ order_book.hpp
β β βββ matching_engine.hpp
β βββ protocol/ # Exchange protocols
β β βββ fix_message.hpp
β β βββ websocket_handler.hpp
β β βββ rest_handler.hpp
β βββ marketdata/ # Market data handling
β β βββ market_data_handler.hpp
β βββ benchmark/ # Performance tests
β β βββ benchmark_main.cpp
β βββ apps/ # Applications
β βββ matching_engine_main.cpp
β βββ market_data_feed_main.cpp
β βββ order_gateway_main.cpp
βββ tests/ # Unit tests
# Build and run the development container
docker-compose up -d hft-dev
# Enter the container
docker exec -it hft-trading-system bash
# Build the project
mkdir -p build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release ..
ninja# Start matching engine
docker-compose up matching-engine
# Start market data feed
docker-compose up market-data
# Run benchmarks
docker-compose --profile benchmark up benchmarks# Matching Engine Server (port 8080)
./build/bin/matching_engine
# Market Data Feed Simulator (port 9090)
./build/bin/market_data_feed
# Order Gateway (port 9000)
./build/bin/order_gateway
# Benchmark Suite
./build/bin/benchmark_suite# Health check
curl http://localhost:8080/health
# Get order book depth
curl http://localhost:8080/api/v1/depth/BTC-USD
# Get quote
curl http://localhost:8080/api/v1/quote/BTC-USD
# Submit order
curl -X POST http://localhost:8080/api/v1/order \
-H "Content-Type: application/json" \
-d '{"symbol":"BTC-USD","side":"BUY","type":"LIMIT","price":50000.0,"quantity":1.0}'
# Cancel order
curl -X DELETE http://localhost:8080/api/v1/order/BTC-USD/12345
# Get stats
curl http://localhost:8080/api/v1/stats# Submit order with risk checks
curl -X POST http://localhost:9000/api/v1/order \
-H "Content-Type: application/json" \
-d '{"symbol":"BTC-USD","side":"BUY","type":"LIMIT","price":50000.0,"quantity":1.0}'
# Get position
curl http://localhost:9000/api/v1/position/BTC-USD
# Get gateway stats (includes latency percentiles)
curl http://localhost:9000/api/v1/statsRun the benchmark suite to measure system performance:
./build/bin/benchmark_suiteExample output:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HFT Trading System - Performance Benchmark β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
System Information:
CPU Cores: 16
Cache Line: 64 bytes
Order Size: 64 bytes
TSC Frequency: 3.20 GHz
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SPSC Queue Benchmark
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Push Latency: mean=15.2ns, p99=45ns
Pop Latency: mean=12.8ns, p99=38ns
Throughput: 52.3M ops/sec
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Order Book Benchmark
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Add Order: mean=425ns, p99=1.2ΞΌs
Cancel Order: mean=180ns, p99=450ns
Match Order: mean=890ns, p99=2.1ΞΌs
cd build
./bin/unit_tests# Disable CPU frequency scaling
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Increase network buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
# Disable swap
sudo swapoff -a
# Set real-time scheduling limits
echo '* soft rtprio 99' | sudo tee -a /etc/security/limits.conf
echo '* hard rtprio 99' | sudo tee -a /etc/security/limits.conf# Add to kernel parameters (GRUB_CMDLINE_LINUX)
isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3The SPSC queue implementation uses:
- Power-of-2 sizing for branchless modulo
- Cache-line separated head/tail indices
- Local caching of remote indices to reduce cache coherence traffic
- Acquire-release memory ordering for minimal synchronization
The custom memory pool provides:
- O(1) allocation and deallocation
- Zero heap fragmentation
- Cache-aligned blocks
- Thread-local pools for lock-free operation
The order book uses:
std::mapfor price levels (O(log N) access)- Intrusive linked lists at each level (O(1) order ops)
- Separate bid/ask structures for cache locality
- C++20 compatible compiler (GCC 12+, Clang 14+)
- CMake 3.20+
- Boost 1.74+
- OpenSSL
- Linux (for full feature support)
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)MIT License - See LICENSE file for details.
Inspired by real-world trading systems and low-latency programming techniques from:
- "Trading and Exchanges" by Larry Harris
- "Market Microstructure in Practice" by Lehalle and Laruelle
- Various open-source trading system implementations