A comprehensive interprocess communication (IPC) benchmark suite implemented in Rust, designed to measure performance characteristics of different IPC mechanisms with a focus on latency and throughput.
This benchmark suite provides a systematic way to evaluate the performance of various IPC mechanisms commonly used in systems programming. It's designed to be:
- Comprehensive: Measures both latency and throughput for one-way and round-trip communication patterns
- Configurable: Supports various message sizes, concurrency levels, and test durations
- Reproducible: Generates detailed JSON output suitable for automated analysis
- Production-ready: Optimized for performance measurement with minimal overhead
- Unix Domain Sockets (
uds
) - Low-latency local communication - Shared Memory (
shm
) - Highest throughput for large data transfers - TCP Sockets (
tcp
) - Network-capable, standardized communication - POSIX Message Queues (
pmq
) - Kernel-managed, priority-based messaging
- Latency Metrics: One-way and round-trip latency measurements
- Throughput Metrics: Message rate and bandwidth measurements
- Statistical Analysis: Percentiles (P50, P95, P99, P99.9), min/max, standard deviation
- Concurrency Testing: Multi-threaded/multi-process performance evaluation
- Warmup Support: Stabilizes performance measurements
- JSON: Machine-readable structured output
- Streaming: Real-time results during execution
- Human-readable: Formatted console output with progress indicators
- Rust: 1.75.0 or later
- Operating System: Linux (tested on RHEL 9.6)
- Dependencies: All handled by Cargo
git clone https://github.com/your-org/ipc-benchmark.git
cd ipc-benchmark
cargo build --release
The optimized binary will be available at target/release/ipc-benchmark
.
# Run all IPC mechanisms with default settings
./target/release/ipc-benchmark -m all
# Run specific mechanisms with custom configuration
./target/release/ipc-benchmark \
-m uds shm tcp \
--message-size 4096 \
--iterations 50000 \
--concurrency 4
# Run benchmark with default settings
ipc-benchmark
# Run with specific mechanisms
ipc-benchmark -m uds shm pmq
# Run all mechanisms (including PMQ)
ipc-benchmark -m all
# Test POSIX message queue specifically
ipc-benchmark -m pmq --message-size 1024 --iterations 10000
# Run with custom message size and iterations
ipc-benchmark --message-size 1024 --iterations 10000
# Run for a specific duration
ipc-benchmark --duration 30s
# Run with multiple concurrent workers
ipc-benchmark --concurrency 8
# Run with custom output file
ipc-benchmark --output-file results.json
# Enable streaming output during execution
ipc-benchmark --streaming-output streaming_results.json
# Run only round-trip tests
ipc-benchmark --round-trip --no-one-way
# Custom percentiles for latency analysis
ipc-benchmark --percentiles 50 90 95 99 99.9 99.99
# TCP-specific configuration
ipc-benchmark -m tcp --host 127.0.0.1 --port 9090
# Shared memory configuration
ipc-benchmark -m shm --buffer-size 16384
ipc-benchmark \
-m shm \
--message-size 65536 \
--concurrency 8 \
--duration 60s \
--buffer-size 1048576
ipc-benchmark \
-m uds \
--message-size 64 \
--iterations 100000 \
--warmup-iterations 10000 \
--percentiles 50 95 99 99.9 99.99
ipc-benchmark \
-m uds shm tcp pmq \
--message-size 1024 \
--iterations 50000 \
--concurrency 4 \
--output-file comparison.json
ipc-benchmark \
-m all \
--message-size 1024 \
--iterations 50000 \
--concurrency 1 \
--output-file complete_comparison.json
# Test PMQ with different message sizes
ipc-benchmark \
-m pmq \
--message-size 1024 \
--iterations 10000 \
--percentiles 50 95 99
# PMQ with concurrency testing
ipc-benchmark \
-m pmq \
--message-size 512 \
--iterations 5000 \
--concurrency 2 \
--duration 30s
The benchmark generates comprehensive JSON output with the following structure:
{
"metadata": {
"version": "0.1.0",
"timestamp": "2024-01-01T00:00:00Z",
"total_tests": 3,
"system_info": {
"os": "linux",
"architecture": "x86_64",
"cpu_cores": 8,
"memory_gb": 16.0,
"rust_version": "1.75.0",
"benchmark_version": "0.1.0"
}
},
"results": [
{
"mechanism": "UnixDomainSocket",
"test_config": {
"message_size": 1024,
"concurrency": 1,
"iterations": 10000
},
"one_way_results": {
"latency": {
"latency_type": "OneWay",
"min_ns": 1500,
"max_ns": 45000,
"mean_ns": 3200.5,
"median_ns": 3100,
"percentiles": [
{"percentile": 95.0, "value_ns": 5200},
{"percentile": 99.0, "value_ns": 8500}
]
},
"throughput": {
"messages_per_second": 312500.0,
"bytes_per_second": 320000000.0,
"total_messages": 10000,
"total_bytes": 10240000
}
},
"summary": {
"total_messages_sent": 10000,
"total_bytes_transferred": 10240000,
"average_throughput_mbps": 305.17,
"p95_latency_ns": 5200,
"p99_latency_ns": 8500
}
}
],
"summary": {
"fastest_mechanism": "SharedMemory",
"lowest_latency_mechanism": "UnixDomainSocket"
}
}
For optimal performance measurement:
- CPU Frequency Scaling: Disable frequency scaling for consistent results
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- Process Isolation: Use CPU affinity to isolate benchmark processes
taskset -c 0-3 ipc-benchmark --concurrency 4
- Memory: Ensure sufficient RAM for shared memory tests
- Disk I/O: Results may be affected by disk I/O for file operations
- Warmup: Use warmup iterations to stabilize performance
- Duration: Longer test durations provide more stable results
- Noise: Run tests on idle systems for best accuracy
- Repetition: Run multiple test iterations and average results
- Permission Errors: Ensure write permissions for output files and temp directories
- Port Conflicts: Use different ports for TCP tests if default ports are occupied
- Memory Issues: Reduce buffer sizes or concurrency for memory-constrained systems
- Compilation Issues: Ensure Rust toolchain is up to date
# Enable verbose logging
RUST_LOG=debug ipc-benchmark --verbose
# Run with detailed tracing
RUST_LOG=trace ipc-benchmark
If you encounter performance issues:
- Check system resource utilization
- Verify no other processes are interfering
- Consider reducing concurrency or message sizes
- Ensure sufficient system resources
# Development build
cargo build
# Release build (optimized)
cargo build --release
# With debugging symbols
cargo build --release --debug
# Run all tests
cargo test
# Run specific test modules
cargo test metrics
cargo test ipc
# Run with output
cargo test -- --nocapture
# Run internal benchmarks
cargo bench
# Profile with perf
perf record --call-graph dwarf target/release/ipc-benchmark
perf report
See CONTRIBUTING.md for detailed contribution guidelines.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
See CHANGELOG.md for version history and changes.
Note: This benchmark suite is optimized for Linux systems. While it may work on other Unix-like systems, testing and validation have been performed primarily on RHEL 9.6.
Important: The benchmark has different concurrency behavior depending on the transport mechanism:
Transport | Concurrency > 1 | Behavior |
---|---|---|
TCP | ✅ Supported | Simulated concurrency (sequential tests) |
Unix Domain Sockets | ✅ Supported | Simulated concurrency (sequential tests) |
Shared Memory | Automatically uses concurrency = 1 |
Race Condition Protection: Shared memory automatically falls back to single-threaded mode when --concurrency > 1
is specified to prevent race conditions and "unexpected end of file" errors.
# This command:
ipc-benchmark -m shm -c 4 -i 1000 -s 1024
# Automatically becomes:
# ipc-benchmark -m shm -c 1 -i 1000 -s 1024
# (with warning message)
- Shared Memory: The ring buffer implementation has inherent race conditions with multiple concurrent access
- TCP/UDS: True concurrent connections require complex server architecture beyond the current scope
The benchmark now validates buffer sizes for shared memory to prevent buffer overflow errors:
# This will show a warning:
ipc-benchmark -m shm -i 10000 -s 1024 --buffer-size 8192
# Warning: Buffer size (8192 bytes) may be too small for 10000 iterations
# of 1024 byte messages. Consider using --buffer-size 20971520
# ✅ Optimal for latency measurement
ipc-benchmark -m all -c 1 -i 10000 -s 1024
# ✅ Good for throughput analysis (TCP/UDS only)
ipc-benchmark -m tcp,uds -c 4 -i 10000 -s 1024
# ✅ Shared memory with adequate buffer
ipc-benchmark -m shm -i 10000 -s 1024 --buffer-size 50000000
# ⚠️ Will automatically use c=1 for shared memory
ipc-benchmark -m shm -c 4 -i 10000 -s 1024
Common issues and solutions:
- "Unexpected end of file": Increase
--buffer-size
for shared memory - "Timeout sending message": Buffer too small, increase
--buffer-size
- Hanging with concurrency: Fixed - now uses safe fallbacks
- Single-threaded (
-c 1
): Most accurate latency measurements - Simulated concurrency (
-c 2+
): Good for throughput scaling analysis - Shared memory: Always single-threaded for reliability